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Abstract 

An elegant characterization of the complexity of constraint satisfaction problems has emerged 
in the form of the the algebraic dichotomy conjecture of [BKJ00]. Roughly speaking, the charac¬ 
terization asserts that a CSP A is tractable if and only if there exist certain non-trivial operations 
known as polymorphisms to combine solutions to A to create new ones. In an entirely separate 
line of work, the unique games conjecture yields a characterization of approximability of Max- 
CSPs. Surprisingly, this characterization for Max-CSPs can also be reformulated in the language 
of polymorphisms. 

In this work, we study whether existence of non-trivial polymorphisms implies tractability 
beyond the realm of constraint satisfaction problems, namely in the value-oracle model. Specif¬ 
ically, given a function / in the value-oracle model along with an appropriate operation that 
never increases the value of /, we design algorithms to minimize /. In particular, we design a ran¬ 
domized algorithm to minimize a function f : [q\ n — > K that admits a fractional polymorphism 
which is measure preserving and has a transitive symmetry. 

We also reinterpret known results on MaxCSPs and thereby reformulate the unique games 
conjecture as a characterization of approximability of max-CSPs in terms of their approximate 
polymorphisms. 
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1 Introduction 


A vast majority of natural computational problems have been classified to be either polynomial-time 
solvable or NP-complete. While there is little progress in determining the exact time complexity for 
fundamental problems like matrix multiplication, it can be argued that a much coarser classification 
of P vs NP-complete has been achieved for a large variety of problems. Notable problems that elude 
such a classification include factorization or graph isomorphism. 

A compelling research direction at this juncture is to understand what causes problems to be 
easy (in P) or hard (NP-complete). More precisely, for specific classes of problems, does there exist 
a unifying theory that explains and characterizes why some problems in the class are in P while 
others are NP-complete? For the sake of concreteness, we will present a few examples. 

It is well-known that 2- Sat is polynomial-time solvable, while 3- Sat is NP-complete. However, 
the traditional proofs of these statements are unrelated to each other and therefore shed little 
light on what makes 2-Sat easy while 3-Sat NP-complete. Similar situations arise even when 
we consider approximations for combinatorial optimization problems. For instance, Min s-t Cut 
can be solved exactly, while Min 3- WAY CUT is NP-complete and can be approximated to a factor 
of jj. It is natural to ask why the approximation ratio for Min s-t Cut is 1, while it is yy for 
Min 3- way cut. 

Over the last decade, unifying theories of tractability have emerged for the class of constraint 
satisfaction problems (CSP) independently for the exact and the optimization variants. Surprisingly, 
the two emerging theories for the exact and optimization variants of CSPs appear to coincide! 
While these candidate theories remain conjectural for now, they successfully explain all the existing 
algorithms and hardness results for CSPs. To set the stage for the results of this paper, we begin 
with a brief survey of the theory for CSPs. 

Satisfiability. A constraint satisfaction problem (CSP) A is specified by a family of predicates 
over a finite domain \q] ={1,2,..., < 7 }. Every instance of the CSP A consists of a set of variables 
V, along with a set of constraints C on them. Each constraint in C consists of a predicate from the 
family A applied to a subset of variables. For a CSP A, the associated satisfiability problem A-Sat 
is defined as follows. 

Problem 1.1 (A-Sat). Given an instance Q of the CSP A, determine whether there is an assign¬ 
ment satisfying all the constraints in 9. 

A classic theorem of Schaefer [Sch78] asserts that among all satisfiability problems over the 
boolean domain ({0,1})), only Linear-Equations-Mod-2, 2-Sat, Horn-Sat, Dual-Horn Sat 
and certain trivial CSPs are solvable in polynomial time. The rest of the boolean CSPs are NP- 
complete. The dichotomy conjecture of Feder and Vardi [FV98] asserts that every A-Sat is in P 
or NP-complete. The conjecture has been shown to hold for CSPs over domains of size up to 4 
[Bul06]. 

In this context, it is natural to question as to what makes certain A-Sat problems tractable 
while the others are NP-complete. Bulatov, Jeavons and Krokhin [BKJ00] conjectured a beautiful 
characterization for tractable satisfiability problems. We will present an informal description of 
this characterization known as the algebraic dichotomy conjecture. We refer the reader to the work 
of Kun Szegedy [KS09] for a more formal description. 

To motivate this characterization, let us consider a CSP known as the XOR problem. An 
instance of the XOR problem consists of a system of linear equations over Z 2 = {0,1}. Fix an 
instance 9 of XOR over n variables. Given three solutions X^ 1 ), X^ 2 \ X® € {0, l} n to 9, one can 
create a new solution Y e {0, l} n : 

Yi = X\ l) © X\ 2) © xf 1 Mi € [n] . 


1 


It is easy to check that Y is also a feasible solution to the instance Thus the XOR : {0, l} 3 —>• 
{0,1} yields a way to combine three solutions in to a new solution for the same instance. Note that 
the function XOR was applied to each bit of the solution individually. An operation of this form 
that preserves the satisfiability of the CSP is known as a polymorphism. Formally, a polymorphism 
of a CSP A-Sat is defined as follows: 


Definition 1.2 (Polymorphisms). A function p : [q] R —>• [q] is said to be a polymorphism for the 
CSP A-Sat, if for every instance Q of A and R assignments X^^X^,... ,X^ € [q] n that satisfy 
all constraints in the vector Y £ [q] n defined below is also a feasible solution. 


Yi=p{ Xf-\xf\x\ z \ 


.X, 


(Rh 


Vi € [n] . 


Note that the dictator functions p{x^\ ..., x^) = x® are polymorphisms for every CSP A-Sat. 
These will be referred to as projections or trivial polymorphisms. All the tractable cases of boolean 
CSPs in Schaefer’s theorem are characterized by existence of non-trivial polymorphisms. Specifi¬ 
cally, 2-SAT has the Majority functions, Horn-SAT has OR functions, and Dual Horn-SAT has 
the AND functions as polymorphisms. Roughly speaking, Bulatov et al. [BKJ00] conjectured that 
the existence of non-dictator polymorphisms characterizes CSPs that are tractable. Their work 
showed that the set of polymorphisms Poly(A) of a CSP A characterizes the complexity of A-Sat. 
Formally, 


Theorem 1.3. [BKJ00] If two CSPs Ai, A 2 have the same set of polymorphisms Poly(Ai) = 
Poly(A 2 ), then Ai-sat is polynomial-time reducible to A2-SAT and vice versa. 

There are many equivalent ways of formalizing what it means for an operation to be non-trivial 
or non-dictator. A particularly simple way to formulate the algebraic dichotomy conjecture arises 
out of the recent work of Barto and Kozik [BK12]. A polymorphism p : —>• [q] is called a cyclic 

term if 


p(x 1 ,. . ,,x k ) =p(x 2 , ■ ■ ■ ,x k ,x{) = ... = p(x k ,Xk- 1 , ■ ■ ■ ,xi) Vxi, ■ ■ ■ ,x k G [q]. 

Note that the above condition strictly precludes the operation p from being a dictator. 

Conjecture 1.4. ( [BKJ00, MM08, BK12]) A-Sat is in P if A admits a cyclic term, otherwise 
A-Sat is NP -complete. 

Surprisingly, one of the implications of the conjecture has already been resolved. 

Theorem 1.5. [BKJ00, MM08, BK12] A-Sat is NP -complete if A does not admit a cyclic term. 

The algebraic approach to understanding the complexity of CSPs has received much attention, 
and the algebraic dichotomy conjecture has been verified for several subclasses of CSPs such as 
conservative CSPs [Bul03], CSPs with no ability to count [BK09] and CSPs with Maltsev operations 
[BD06]. Recently, Kun and Szegedy reformulated the algebraic dichotomy conjecture using analytic 
notions similar to influences [KS09]. 

Polymorphisms in Optimization. Akin to polymorphisms for A-Sat, we define the notion of 
approximate polymorphisms for optimization problems. Roughly speaking, an approximate poly¬ 
morphism is a probability distribution V over a set of operations of the form p : [q] k —>• [q] . In 
particular, an approximate polymorphism V can be used to combine k solutions to an optimization 
problem and produce a probability distribution over new solutions to the same instance. Unlike 
the case of exact CSPs, here the polymorphism outputs several new solutions. V is an (c, s)- 
approximate polymorphism if, given a set of solutions with average value c, the average value of 
the output solutions is always at least s. 
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For the sake of exposition, we present an example here. Suppose / : {0, l} n —>• is a super- 

modular function. In this case, / admits a 1-approximate polymorphism described below. 


V 


OR(X\,X 2 ) with probability i 

AND(x i,x 2 ) with probability 77 


Given any two assignments A^ 1 ), X^ 2 \ the polymorphism V outputs two solutions X^ 1 ) V X® and 
XW /\ x^ 2 \ The supermodularity of the function / implies 

val(xW) + val(X( 2 )) < val(xW V X^) + val(xW A X®) , 


that the average value of solutions output by V is at least the average value of solutions input to 
it. The formal definition of an (c, s)-approximate polymorphism is as follows. 

Definition 1.6 ((c, s)-approximate polymorphism). Fix a function f : [q] n —>• M + . A probability 
distribution V over operations p : [gj R —>• [q] is an (c, s)-approximate polymorphism for f if the 
following holds: for every R assignments xW,X( 2 ),...,X( fl ) € [q} n , i/Ej/(xW) ^ c for all i 

th P7i 

e:[ f( P (xW,...,xM))]>s 

pev 

Here, p(X^\ ..., X^) is the assignment obtained by applying the operation p coordinate-wise. 

It is often convenient to use a coarser notion of approximation, namely the approximation ratio 
for all c. In this vein, we define a-approximate polymorphisms. 

Definition 1.7 (a-approximate polymorphism). A probability distribution V over operations p : 
[gp —>• [q] is an a-approximate polymorphism for f : [q] n —>• M + , if it is an (c, a ■ c)-approximate 
polymorphism for all c ^ 0. 

We will refer to a 1-approximate polymorphism as a fractional polymorphism along the lines of 
Cohen et al. [DMP06]. 


Approximating CSPs. To set the stage for the main result of the paper, we will reformulate 
prior work on approximation of CSPs in the language of polymorphisms. While the results stated 
here are not new, their formulation in the language of approximate polymorphisms has not been 
carried out earlier. This reformulation highlights the the close connection between the algebraic 
dichotomy conjecture (a theory for exact CSPs) and the unique games conjecture (a theory of 
approximability of CSPs). We believe this reformulation is also one of the contributions of this 
work and have devoted Section 7 to give a full account of it. The results we will state now hold for 
a fairly general class of problems that include both maximization and minimization problems with 
local bounded payoff functions (referred to as value CSP in [CCC + 13]). However, for the sake of 
concreteness, let us restrict our attention to the problem Max-A for a CSP A. 

Problem 1.8 (Max-A). Given an instance 9 of the A-CSP, find an assignment that satisfies the 
maximum number (equivalently fraction) of constraints. 

The formal definition of a-approximate polymorphisms is as follows. 

Definition 1.9. A probability distribution V over operations p : [q] k —>• [q] is an a-approximate 
polymorphism for Max-A, if V is an (c, a ■ c)-approximate polymorphism for every instance Q of 
Max-A. 


3 


In the above definition, we treat each instance 9 as a function 9 : [q] n —>• R + that we intend to 
maximize. 

As in the case of A-Sat, dictator functions are 1-approximate polymorphisms for every Max-A. 
We will use the analytic notion of influences to preclude the dictator functions (see Section 7 ). 
Specifically, we define r-quasirandomness as follows. 


Definition 1.10. An approximate polymorphism V is (r, d)-quasirandom if, for every distribution 
p on [q\, 


E 

peV 


max 

i 



Vi € [fc] 


By analogy to A-Sat, it is natural to conjecture that approximate polymorphisms characterize 
the approximability of every Max-A. Indeed, all known tight approximability results for different 
Max-CSPs like 3-Sat correspond exactly to the existence of approximate polymorphisms. Moreover, 
we show that the unique games conjecture [Kho02] is equivalent to asserting that existence of 
quasirandom approximate polymorphisms characterizes the complexity of approximating every CSP. 
To state our results formally, let us define some notation. For a CSP A define the approximation 
constant a A as, 


a A = sup {a € R | Vr, d > 0 3(r, oQ-quasirandom, a-approximate polymorphism for A} . 


Similarly, for each c > 0 define the constant s A (c) as, 
s A (c) = f sup{s € R | Vr, d > 0 3(r, d)-quasirandom, (c, s)-approximate polymorphism for A} . 


The connection between unique games conjecture (stated in Section 7 ) and approximate polymor¬ 
phisms is summarized below. 

Theorem 1.11 (Connection to UGC). Assuming the unique games conjecture, for every A, it is NP- 
hard to approximate Max-A better than a A . Moreover, the unique games conjecture is equivalent 
to the following statement: For every A and c, on instances of Max-A with value c, it is NP -hard 
to find an assignment with value larger than s A (c). 

All unique games based hardness results for Max-A are shown via constructing appropriate 
dictatorship testing gadgets [KKMO07, Rag08]. The above theorem is a consequence of the con¬ 
nection between approximate polymorphisms and dictatorship testing (see Section 7.4 for details). 
The connection between approximate polymorphisms and dictatorship testing was observed in the 
work of Kun and Szegedy [KS09]. 

Surprisingly, the other direction of the correspondence between approixmate polymorphisms 
and complexity of CSPs also holds. Formally, we show the following result about a semidefinite 
programming relaxation for CSPs referred to as the Basic SDP relaxation (see Section 7.3 for 
definition). 

Theorem 1.12 (Algorithm via Polymorphisms). For every CSP A, the integrality gap of the Basic- 
SDP relaxation for Max-A is at most a A . More precisely, for every instance Q of Max-A, for 
every c, if the optimum value of the Basic SDP relaxation is at least c, then optimum value for SI 
is at least lim^o s A (c — e). 

In particular, the above theorem implies that the Basic-SDP relaxation yields an a A (or s A (c)) 
approximation to Max-A. This theorem is a consequence of restating the soundness analysis of 
[Rag08] in the language of approximate polymorphisms. Specifically, one uses the a-approximate 
polymorphism to construct an a-factor rounding algorithm to the semidefinite program for Max-A 
(See Section 7.3 for details). In the case of the algebraic dichotomy conjecture, the NP-hardness 
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result is known, while an efficient algorithm for all CSPs via polymorphisms is not known. In 
contrast, the Basic SDP relaxation yields an efficient algorithm for all Max- A, but the NP- 
hardness for max-CSPs is open and equivalent to the unique games conjecture. 

The case of Max-A that are solvable exactly (approximation ratio = 1) has also received consid¬ 
erable attention in the literature. Generalising the algebraic approach to CSPs, algebraic properties 
called multimorphisms [CCJK06], fractional polymorphisms [DMP06] and weighted polymorphisms 
[CCC + 13] have been proposed to study the complexity of classes of valued CSPs. The notion of 
a-approximate polymorphism for a = 1 is closely related to these notions (see Section 7.1 ). 

In a recent work, Thapper and Zivny [TZ13] obtained a characterization of Max-A that can 
be solved exactly. Specifically, they showed that a valued CSP (a generalization of maxCSP) is 
tractable if and only if it admits a symmetric 1-approximate polymorphism on two inputs. This is 
further evidence supporting the claim that approximate polymorphisms characterize approximabil- 
ity of max-CSPs. 

Beyond CSPs. It is natural to wonder if a similar theory of tractability could be developed for 
classes of problems beyond constraint satisfaction problems. For instance, it would be interesting to 
understand the tractability of Minimum Spanning Tree, Minimum Matching, the intractability 
of TSP Steiner Tree and the approximability of Steiner Tree, Metric TSP and Network 
Design problems. 

It appears that tractable problems such as Minimum Spanning Tree and Maximum Match¬ 
ing have certain “operations” on the solution space. For instance, the union of two perfect match¬ 
ings contains alternating cycles that could be used to create new matchings. Similarly, spanning 
trees form a matroid and therefore given two spanning trees, one could create a sequence of trees 
by exchanging edges amongst them. Furthermore, some algorithms for these problems crucially 
exploit these operations. More indirectly, Minimum Spanning Tree and Maximum Matching 
are connected to submodularity via polymatroids. As we saw earlier, submodularity is an example 
of a l-approxinrate polymorphism. 

Moreover, here is a heuristic justification for the connection between tractability and exis¬ 
tence of operations to combine solutions. Typically, a combinatorial optimization problem like 
Maximum Matching is tractable because of the existence of a linear programming relaxation P 
and an associated rounding scheme 1Z (possibly randomized). Given a set of k integral solutions 
X^ l \X^ 2 \ ... ,X^ k \ consider any point in their convex hull, say y = \ Zue[fc] By convexity, 

the point y is also a feasible solution to the linear programming relaxation P . Therefore, we 
could execute the rounding scheme 1Z on the LP solution y to obtain a distribution over integral 
solutions. Intuitively, y has less information than X^\..., X^ and therefore the solutions out¬ 
put by the rounding scheme 7Z should be different from This suggests that the 

rounding scheme 1Z yields an operation to combine solutions to create new solutions. Recall that 
in case of CSPs, indeed polymorphisms are used to obtain rounding schemes for the semidefinite 
programs (see Theorem 1.12). More recently, Barak, Steurer and Kelner [BKS14] show how certain 
algorithms to combine solutions can be used towards rounding sum of squares relaxations. 

1.1 Our Results 

Algorithmically, one of the fundamental results in combinatorial optimization is polynomial-time 
minimization of arbitrary submodular functions. Specifically, there exists an efficient algorithm to 
minimize a submodular function / given access to a value oracle for / [Cun81, Cun83, Cun84, SchOO]. 
Since submodularity is an example of a fractional polymorphism, it is natural to conjecture that 
such an algorithm exists whenever / admits a certain fractional polymorphism. Our main result is 
an efficient algorithm to minimize a function / given access to its value oracle, provided / admits 
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an appropriate fractional polymorphism. Towards stating our result formally, let us define some 
notation. 


Definition 1.13. A fractional polymorphism is said to be measure-preserving if for each i E [g] 
and for every choice of inputs, the fraction of inputs equal to i is always equal to the fraction of 
outputs equal to i. 

Definition 1.14. An operation p : [q] k —> [g] is said to have a transitive symmetry if for all i,j E [A;] 
there exists a permutation Oij E Sk of inputs such that atj(i) = j and po a = p. A polymorphism 
V is said to be transitive symmetric if every operation p E supp('P) is transitive symmetric. 


Notice that cyclic symmetry of the operation is a special case of transitive symmetry. As per 
our definitions, submodularity is a measure-preserving and transitive symmetric polymorphism. 


Theorem 1.15. Let f : [q] n —>• M + be a function that admits a fractional polymorphism V. If V 
is measure preserving and transitive symmetric then there exists an efficient randomized algorithm 
A that for every e > 0, makes poly(l/e,n) queries to the values of f and outputs an x E [ q] n such 
that, 


f{x) ^ min f(y) + e\\f\\oo 

I n\ n 


Apart from submodularity, here is an example of a measure preserving and transitive symmetric 
polymorphism. 


V 


Majority(xi,X 2 ,X 3 ) with probability 2 /3 
Minority(xi, X2, X3) with probability 1/3 


Here Minority(0,0,0) = 0, Minority (1,1,1) = 1 and Minority (xi, X2, X3) = 1 — Majority(xi, X2, X3) 
on the rest of the inputs. On a larger domain [q] , a natural example of a fractional polymorphism 
would be 

! max(xi,X2,X3) with probability 1 /.3 
min(xi, X2, X3) with probability l /z 
median(x 1, X2, X3) with probability 1/3 

More generally, it is very easy to construct examples of fractional polymorphisms that satisfy the 
hypothesis of Theorem 1.15. 


1.2 Related Work 

Operations that give rise to tractability in the value oracle model have received considerable atten¬ 
tion in the literature. A particularly successful line of work studies generalizations of submodularity 
over various lattices. In fact, submodular functions given by an oracle can be minimised on distribu¬ 
tive lattices [IFF01, SchOO], diamonds [Kuill], and the pentagon [KL08] but the case of general 
non-distributive lattices remains open. 

An alternate generalization of submodularity known as bisubmodular functions introduced by 
Chandrasekaran and Kabadi [CK88] arises naturally both in theoretical and practical contexts (see 
[FI05, SGB12]). Bisubmodular functions can be minimized in polynomial time given a value oracle 
over domains of size 3 [FI05, Qi88] but the complexity is open on domains of larger size [HK12]. 

Fujishige and Murota [FM00] introduced the notion of L^-convex functions - a class of functions 
that can also be minimized in the oracle model [Mur04]. In recent work, Kolmogorov [KollO] 
exhibited efficient algorithms to minimize strongly tree-submodular functions on binary trees, which 
is a common generalization of L^-convex functions and bisubmodular functions. 
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1.3 Technical Overview 


The technical heart of this paper is devoted to understanding the evolution of probability distribu¬ 
tions over [q] n under iterated applications of operations. Fix a probability distribution p over [q] n . 
For an operation p : [q\ k —>• [q], the distribution p(p) over [ q] n is one that is sampled by taking 
k independent samples from p and applying the operation p to them. Fix a sequence {pi}f^i of 
operations with transitive symmetries. Define the dynamical system, 


Mi =Pt(fH- 1), 


with po = p. We study the properties of the distribution p t as t —>• oo. Roughly speaking, the key 
technical insight of this work is that the correlations among the coordinates decay as t —>• oo. 

For example, let us suppose po is such that for each i £ [n], the i th coordinate of po is not 
perfectly correlated with the first (i — l)-coordinates. In this case, we will show that pt converges 
in statistical distance to a product distribution as t oo (see Theorem 3.3). From an algorithmic 
standpoint, this is very valuable because even if p^ has no succinct representation, the limiting 
distribution Hindoo p t has a very succinct description. Moreover, since the operations pi are applied 
to each coordinate separately, the marginals of the limiting distribution lim^oo pt are determined 
entirely by the marginals of the initial distribution pq. Thus, since the limiting distribution is a 
product distribution, it is completely determined by the marginals of the initial distribution pol 

Consider an arbitrary probability distribution p over [q] n . Let Ti _ 7 o p denote the probability 
distribution over [q\ n obtained by sampling from p and perturbing each coordinate with a tiny 
probability 7 . For small 7 , the statistical distance between p and Ti _ 7 o p is small, i.e., ||Xi _ 7 o p — 
p\\i ^ 7 ^i • However, if we initialize the dynamical system with po = T\_^p then irrespective of the 
starting distribution p, the limiting distribution is always a product distribution (see Corollary 3.4). 
A brief overview of the correlation decay argument is presented in Section 3.2. The details of the 
argument are fairly technical and draw upon various analytic tools such as hypercontractivity, the 
Berry-Esseen theorem and Fourier analysis (see Section 6.3). A key bottleneck in the analysis is 
that the individual marginals change with each iteration thereby changing the fourier spectrum of 
the operations involved. 

Recall that the algebraic dichotomy conjecture for exact CSPs asserts that a CSP A admits a 
cyclic polymorphism if and only if the CSP A is in P. It is interesting to note that the correlation 
decay phenomena applies to cyclic terms. Roughly speaking, the algebraic dichotomy conjecture 
could be restated in terms of correlation decay in the above described dynamical system. This 
characterization closely resembles the absorbing subalgebras characterization by Barto and Kozik 
[BK12] derived using entirely algebraic techniques. 

Our approach to prove Theorem 1.15 is as follows. For any function / : [q] n —>• K, one can 
define a convex extension /. Let k q denote the g-dimensional simplex. 


Definition 1.16. (Convex Extension) Given a function f : [q] n —>• M, define its convex extension 
on f : A q —>■ M to be 


m 


min E \f(x)] 

p.d.f. p over [q] n 
P'i = Z% 


where the minimization is over all probability distributions p over [q] n whose marginals pi coincide 
with Zi. 


The convex extension f(z) is the minimum expected value of / under all probability distributions 
over [q] n whose marginals are equal to 2 . As the name suggests, / is a convex function minimizing 
which is equivalent to minimizing /. Since / is convex, one could appeal to techniques from convex 
optimization such as the ellipsoid algorithm to minimize f. However it is in general intractable to 
even evaluate the value of / at a given point in A”. In the case of a submodular function /, its 
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convex extension / can be evaluated at a given point via a greedy algorithm, and / coincides with 
the function known as the Lovasz-extension of /. 

We exhibit a randomized algorithm to evaluate the value of the convex extension / when / 
admits a fractional polymorphism. Given a point z E k q , the randomized algorithm computes 
the minimum expected value of / over all probability distributions p whose marginals are equal 
to Let p be the optimal probability distribution that achieves the minimum. Here p is an 
unknown probability distribution which might not even have a compact representation. Consider 
the probability distribution T \_ 7 o p obtained by resampling each coordinate independently with 
probability 7 . The probability distribution Ti _ 7 o p is statistically close to p and therefore has 
approximately the same expected value of /. 

Let 1 ' be the limiting distribution obtained by iteratively applying the fractional polymorphism 
V to the perturbed optimal distribution Ti_^op. Since V is a fractional polymorphism, the expected 
value of / does not increase on applying V. Therefore, the limiting probability distribution p' has 
an expected value not much larger than the optimal distribution p. Moreover, since V is measure 
preserving, the limiting probability distribution has the same marginals as / iq\ In other words, the 
limiting distribution p! has marginals equal to z and achieves almost the same expected value as 
the unknown optimal distribution p. By virtue of correlation decay (Corollary 3.4), the limiting 
distribution ^ 1 ' admits an efficient sampling algorithm that we use to approximately estimate the 
value of f{z). 

2 Background 

We first introduce some basic notation. Let [q] denote the alphabet \q\ = {1,... , g}. Furthermore, 
let k q denote the standard simplex in M 9 , i.e., 

k q = {x £ ^ 0 Vi, Xj = 1} . 

i 

For a probability distribution p on the finite set [q] we will write p k to denote the product distri¬ 
bution on [q] k given by drawing k independent samples from p. 

If /r is a joint probability distribution on [q] n we will write pi, P 2 , ■ ■ ■ Pn f° r the n marginal 
distributions of p. Further we will use to denote the product distribution with the same 
marginals as [i. That is we define 


x d e f 

/i = Hi X /X 2 X • • • X Hn . 

An operation p of arity k is a map p : [q] k —>• [q ]. For a set of k assignments 2 a 1 ),..., x^ € [q] n , 
we will use p(x^\ ... , 1 ®) € [q] n to be the assignment obtained by applying the operation p on 
each coordinate of a;W,... ,x^ separately. More formally, let x[p be the jth coordinate of Xi. We 
dehne 

p(x (1) ...x (k) ) = (p{x ( ^ ...xf ) ),p(x? ) ...x ( 2 k) ),..., p{x {k) ...x (k) )^j . 

More generally, an operation can be thought of as a map p : [q] k —>• k q . In particular, we can think 
of p being given by maps (p\ .. .p q ) where pi : [q 1 ]^ —>• M is the indicator 

p,(x) = () : j 

y \ 0 : p(x) 7 ^ i 


We next define a method for composing two k -ary operations p± and P 2 ■ The idea is to think of 
each of p\ and P 2 as nodes with k incoming edges and one outgoing edge. Then we take k copies 
of P 2 and connect those k outputs to each of the k inputs to p \. Formally, we define: 


Definition 2.1. For two operations pi : [g ]^ 1 —>• [q] and p 2 : [g ] fc2 —>• [q], define an operation 
Pi®p- 2 : [q] klXk2 —>• [g] as follows: 

Pi ®P 2 {{xij}ie[k 1 ],je[k 2 \) = Pi (P2(am,aii2, • • • ,xifc 2 ), • • • ,P2{x kl i,x kl2 , ■. .,x klk2 )) 

Next we state the definition for polymorphisms of an arbitrary cost function / : [q ]" —>• R. 
Intuitively, a polymorphism for / is a probability distribution over operations that, on average, 
decrease the value of /. 

Definition 2.2. A l-approximate polymorphism V for a function f : [g] n —>• R, consists of a 
probability distribution V over maps O = {p : [q] k [</]} such that for any set of k assignments 
G [q] n , 

^ E [/(p(sc (1) ^.j. ,a? (fc) ))] 

k “ 

Definition 2.3. For a l-approximate polymorphism V , 'P® r denotes the l-approximate polymor¬ 
phism consisting of a distribution over operations defined as, p\ < 8 > ... < 8 > with pi,... ,p r drawn 
i. i. d from V. 

3 Correlation Decay 

In this section we state our main theorem regarding the decay of correlation between random 
variables under repeated applications of transitive symmetric operations. We begin by defining 
a quantitative measure of correlation and using it to bound the statistical distance to a product 
distribution. 

3.1 Correlation and Statistical Distance 

To gain intuition for our measure of correlation consider the example of two boolean random vari¬ 
ables X and Y with joint distribution p. In this case we will measure correlation by choosing 
real-valued test functions f,g : {0,1} -» R and computing E [f(X)g(Y)]. We would like to define 
the correlation as the supremum of E [f(X)g(Y)\ over all pairs of appropriately normalized test func¬ 
tions. There are two components to the normalization. First, we require E [f(X)] = E[g(Y)] = 0 to 
rule out the possibility of adding a large constant to each test function to increase E [f(X)g(Y)\. Sec¬ 
ond, we require Var[/(X)] = Var[g(Y)] = 1 to ensure (by Cauchy-Schwarz) that E [f(X)g(Y)] ^ 1. 

To see that this notion of correlation makes intuitive sense, suppose X and Y are independent. 
In this case correlation is zero because E [f(X)g(Y)] = E[/(X)] E[g(Y)j = 0. Next suppose that 
X = Y = 1 with probability \ and X = Y = 0 with probability - 7 . In this case we can set 
/(1) = g{ 1) = 1 and /(0) = g(0) = —1 to obtain E [f(X)g(Y)\ = 1. This matches up with the 
intuition that such an X and Y are perfectly correlated. We now give the general definition for our 
measure of correlation. 

Definition 3.1. Let X,Y be discrete-valued random variables with joint distribution p. Let Hi = 
([^l]) Mi) an d ^2 = ([ 92 ]) M 2 ) denote the probability spaces corresponding to X,Y respectively. The 
correlation p(X,Y) is given by 

p(X, Y) = f sup E [f(X)g(Y)} 

Var[/(X)]=Var[s(Y)]=l 

E[/(X)]=E[g(Y)]=0 

We will interchangeably use the notation p(p) or p(Lli, Ll 2 ) to denote the correlation. 
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Next we show that, as the correlation for a pair of random variables X and Y becomes small, 
the variables become nearly independent. In particular, we show that p(X,Y ) can be used to 
bound the statistical distance of (X,Y) from the product distribution where X and Y are sampled 
independently. 

Lemma 3.2. Let X, Y be discrete-valued random variables with joint distribution pxY and re¬ 
spective marginal distributions nx and py- If X takes values in [gi] and Y takes values in [ 52 ], 
then 

II PXY - Px x p Y ||i ^ min(« 7 i, q 2 )p{X,Y) 

Proof. Let {X a } ag r ?1 ] be indicator variables for the events X = a with a €E [< 71 ]. Similarly, define 
the indicator variables {Yb}be[q 2 ]- 

The statistical distance between pxY and px x py is given by 

Wpxy ~ Px X Py\\i = E \nx = a,Y = b\-F[X = a\F[Y = b}\ 

a£[qi],be[q 2 ] 

= E \nX a Y b ]-E[X a ]E[Y b }\ 

a£{qi],b£{q 2 ] 

Set o ab = sign(E[X a y b ] - E[X a ] EfY*,]) and Z a = Ebefe] a ab Y b- 

Wpxy — Px x p Y \\i = E EpC^a] — E[X a ]E [Z a \ 
ae[q 1 ] 

= E Covp C a ,Z a \ 

ae[qi] 

< E V Va r[X a ]^/v a r[Z a ]p(X,Y) 

a£[qi] 

Y E \/VaT[X a ]p(X,Y) (Var[Z a ] < 1 ) 

aS[qi] 

P(X,Y) 

The result follows by symmetry. 

□ 

3.2 Correlation Decay 

To begin we give an explanation of why one should expect correlation to decay under repeated 
applications of symmetric operations. Consider the simple example of two boolean random variables 
X and Y with joint distribution p. Suppose X = Y with probability ^ + 7 and that the marginal 
distributions of both X and Y are uniform on {0,1}. Let p : {0, l} fc —>• {0,1} be the majority 
operation on k bits. That is p(x 1 ... xyf) = 1 if and only if the majority of X\... are one. 

Next suppose we draw k samples (X^Yf) from p and evaluate p(X\ ... Xyf) and p(Y\ ... Yy.). 
Since the marginal distributions of both X and Y are uniform, the same is true for p(X 1 ... Xyf) 
and p{Y\ ... Yyf). However, the probability that p(X 1 ... Xy.) = p{Y\ ... Yj.) is strictly less than ^ + 7 . 
To see why first let F : {—1,1} —>• {—1,1} be the majority function where 1 encodes boolean 0 
and —1 encodes boolean 1. Note that the probability that F(X 1 ... Xyf) = F(Y\ ... Yy : ) is given by 
i + |E [F(X 1 ... X k )F(Y 1 .. .Y k )\. 

Now if we write the Fourier expansion of F the above expectation is 

e^te =E^n^]=E^(27) 151 

S,T L ieS f eT J s ies S 
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Suppose first that all the non-zero Fourier coefficients Fs have |Sj = 1. In this case the probability 
that F(X i... Xjc) = F(Y\ ... Yjf) stays the same since \ + ^(2 7 ) = ^ + 7 . However, in the case of 
majority, it is well known that X!|S|=i < 1 — c for a constant c > 0. Thus, the expectation is in 

fact given by 

E[F(X! ... X k )F(Y 1 ... Y k )] < (1 - c)( 2 7 ) + c( 2 7 ) 2 < 2 7 

Thus the probability that F{X 1 ... X k ) = F(Y\... Y k ) is strictly less than | + 7 . Therefore, if we 
repeatedly apply the majority operation, we should eventually have that X and Y become very 
close to independent. 

There are two major obstacles to generalizing the above observation to arbitrary operations 
with transitive symmetry. The first is that for a general operation p, we will not be able to 
explicitly compute the entire Fourier expansion. Instead, we will have to use the fact that p admits 
a transitive symmetry to get a bound on the total Fourier mass on degree-one terms. The second 
obstacle is that, unlike in our example, the marginal distributions of X and Y may change after 
every application of p. This means that the correct Fourier basis to use also changes after every 
application of p. 

The fact that the marginal distributions change under p causes difficulties even for the simple 
example of the boolean OR operation on two bits. Consider a highly biased distribution over 0,1 
given by X = 1 with probability e and X = 0 with probability 1 — e. Now consider the function 
/( X) = 7 j(X\ + X 2 ). Note that this function agrees with OR except when X\ / X^. Thus, 
f(X) = OR(X ) with probability 1 — 2e(l — s) > 1 — 2e. This means that as £ approaches zero, OR 
approaches a function / with 

/T fs = 1 - 

\S\ = 1 

Thus, there are distributions for which the correlation decay under the OR operation approaches 
zero. This means that we cannot hope to prove a universal bound on correlation decay for every 
marginal distribution, even in this very simple case. It is useful to note that for the OR operation, 
the probability that X = 1 increases under every application. Thus, as long as the initial distribu¬ 
tion has a non-negiligible probability that X = 1, we will have that correlation does indeed decay 
in each step. Of course, this particular observation applies only to the OR operation. However, our 
proof in the general case does rely on the fact that, using only properties of the initial distribution 
of X we can get bounds on correlation decay in every step. 

In summary, we are able to achieve correlation decay for arbitrary transitive symmetric opera¬ 
tions. We now state our main theorem to this effect. 

Theorem 3.3. (Correlation Decay) Let p be a distribution on [q] n . Let X \,..., X n be the jointly 
distributed [q\-valued random variables drawn from p. Let p = msoti p{X \,..., Xj_i, Xf) < 1 and 
A be the minimum probability of an atom in the marginal distributions {pi}i£[ n ]- F° r & n y V > 0, 

the following holds for r ^ Ll q ({^7 log 2 (^r)) ■' If Pi, ■ ■ ■ ,Pr '■ [o\ k [?] is a sequence of operations 
each of which admit a transitive symmetry then, 

\\pi ®P2 ® • • • ®Pr{h) —P1®P2®--- ®Pr(h X ) ||l < P 

We defer the proof of the theorem to Section 6 . Note that the theorem only applies when p < 1 
i.e. when there are no perfect correlations between the X t . To ensure that a distribution has no 
perfect correlations we can introduce a small amount of noise. 

Noise. For a probability distribution p on [q] n let Tj _ 7 o p denote the probability distribution 
over [q] n defined as: 

• Sample X € [q] n from the distribution p. 
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For each i E [n], set 


Xi = 


Xi 

sample from pi 


with probabilityl — 7 
with probability 7 


We prove the following corollary that gives correlation decay for distributions with a small amount 
of added noise. 

Corollary 3.4. Let p be any probability distribution over [q] n and let A denote the minimum 
probability of an atom in the marginals {^i}ie[n]- For any 7,7 > 0 . given a sequence p±,... ,p r : 

[q] k —»■ [g] of operations with transitive symmetry with r > Q q ( \ 0 g°f^ 7 ) log 2 (^f)) 

||pi ®f >2 ® ■ < 8 >p r (Ti _ 7 o p) — pi ®p 2 <g)... < 8 >Pr(/^ x )|| ^ 7 

Proof. Let f ■ [q] —>• R and g : [q ]* _1 —>• R be functions with E[/(Xj)] = E[g(Xi ... Xj-i)] = 0 and 
E[/(X*) 2 ] = E[g(Xi ... Xi^) 2 } = 1 such that 

E[/ (Xi)g(Xi ... X^)] = p(Xi, (Xr... X^)) 

That is, / and g achieve the maximum possible correlation between X* and (Xi ... Xj- 1 ). Let Yj 
be an independent random sample from the marginal pi. Now we expand the above expectation 
by conditioning on whether or not X; was obtained by re-sampling from the marginal p t . 

E[/(Xj)g(Xi. ..Xi- 1 )] = E[/ {Xi)g{X\ .. .Xi_i)](l - j)+ E[f(Y i )g(X 1 .. ,X i _i )] 7 

By Cauchy-Schwarz inequality, the first term above is bounded by 

E[/(X i )7(X 1 ... Xi-iW - 7 ) < \Zm(Xi) 2 ] E b(Xi ... Xi-i) 2 ](i - 7 ) = 1 - 7 

where we have used the fact that E[/(Xj) 2 ] = E[/(X*) 2 ] = 1. Since Y{ is independent of Xi... X-i, 
the second term is 


E[/(l-)ff(^i • • • Xj_i)] 7 = E lf(Yi)] E[g(X! ... X,_ 1 )] 7 = 0 

Thus, we get p(Xj, (Xi... Xj_i)) ^ 1—7 for all i E [n]. The result then follows by applying 
Theorem 3.3 to Ti _ 7 op,. □ 

4 Optimization in the Value Oracle Model 

In this section, we will describe an algorithm to minimize a function / : [q] n —»• R that admits a 
1-approximate polymorphism given access to a value oracle for /. We begin by setting up some 
notation. 

Recall that for a finite set A, Aa denotes the set of all probability distributions over A. For 
notational convenience, we will use the following shorthand for the expectation of the function / 
over a distribution p. 

Definition 4.1. (Distributional Extension) Given a function f : [q] n —> R, define its distributional 
extension F : A^n — > R to be 

[/(*)] 
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Given an operation p : [q] k —>• [q] and a probability distribution p € Ar(/] n , define pip.) to be the 
distribution over [q] n that is sampled as follows: 

• Sample x\, x-z, ■ ■ ■, Xk € [q] n independently from the distribution p. 

• Output p(xi, X 2 , Xk)- 

Notice that for each coordinate i € [n], the i th marginal distribution (p(p))i is given by p{pi). More 
generally, if a V denotes a probability distribution over operations then V® r {p) is a distribution 
over [q] n that is sampled as follows: 

• Sample operations p\,...,p r : [q] k —>• [q] independently from the distribution V 

• Output a sample from p\ ® p 2 <g> ... p r (p)- 

Suppose V is a fractional polymorphism for the function /. By definition of fractional polymor¬ 
phisms, the average value of / does not increase on applying the operations sampled from V. More 
precisely, we can prove the following. 

Lemma 4.2. For every distribution p on [q] n , and a fractional polymorphism V of a function 
f : [q] n -> M and r € N, F{V® r {p)) ^ F(ji) where F is the distributional extension of f 

Proof. We will prove this result by induction on r. First, we prove the result for r = 1. 

F(V{p)) = E [/(*)] 

= E E [f(p(x i,...,x fc ))] 

p~P X\,...,X k ~p 

= E E :[f(p(xi,...,x k ))] 

xi [p^P 

^ E = F (p) 

XI,...,3: k ~p k 

i 

The last inequality in the above calculation uses the fact that V is a fractional polymorphism for 
/. Suppose the assertion is true for r, now we will prove the result for r + 1. Observe that the 
distribution V® r+l {p) can be written as, 

‘P® r+1 (fj)= E [V{p-2® ...®Pr{p))] 

P2,...,p r +l~V 

where p 2 , ■ ■ ■ ,p r are drawn independently from the distribution V. Hence we can write, 

F(V® r+1 (h)) = E [F{V(p2®...®p r+ i(p)))} 

P2,...,p r + 

^ E [F(p 2 <8> • • • <8> p r +i{p))\ using base case 

P2,...,Pr+l~V 

= F(V® r (p))^F(p), 

where the last inequality used the induction hypothesis for r. □ 

Recall that p x is the product distribution with the same marginals as p. We will show the 
following using correlation decay. 

Lemma 4.3. Let V be a fractional polymorphism with a transitive symmetry for a function f : 
[q] n —>• [g]. Let p be a probability distribution over [q] n and let A denote the minimum probability 
of an atom in the marginals {pi} ie [ n ]. For 7 = ^ and r = Ll q log 2 (tt)) 

F(V^ r (p x ))^F(p)+6\\f\\ 00 
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Proof. Consider the distribution Ti _ 7 o p. By definition of Ti _ 7 o p. all the correlations within 
Ti _ 7 o p are at most 1 — 7 . Roughly speaking, with repeated applications of the operations from 
V all the correlations will vanish. More precisely, by Corollary 3.4, for any sequence of operations 
pi,... ,p r with transitive symmetry we have 

\\pi<S>P2 ® ■ ■ • ®zv(Ti _ 7 o p) - pi <g) p 2 <g).. .<S>p r (p x ) || < - (4.1) 

Recall that V® r (p') for a distribution p' is given by, V® r (p') = E pir .. )Pr [pi < 8 >P 2 ■ • • <8 p r (p')\. Aver¬ 
aging (4.1) over all choices of p\, ■ ■ ■ p r from V we get 

||'P 0 r (Ti _ 7 op) — V® r (p x )\\ < ^ (4.2) 

Now we are ready to finish the proof of the lemma. 

F(V® r (p x )) < F('P 0 r (T 1 _ 7 o M )) + |||/||oo using ((4.2)) 

^ E(T ,_ 7 O p) + -ll/lloo using Lemma 4.2 
^ E(/u) + ^ll/lloo using ll/i - Ti _ 7 O //Hi < 771 


□ 

Suppose we are looking to minimize the function / or equivalently the distributional extension 
F. In general, the minima for F could be an arbitrary distribution with no succinct (polynomial¬ 
sized) representation. The preceding lemma shows that V® r (p x ) has roughly the same value of 
F. But V® r {p x ) not only has a succinct representation, but is efficiently sampleable given the 
marginals of pi In the rest of this section, we will use this insight towards minimizing / given a 
value oracle. 


4.1 Convex Extension 

Definition 4.4. (Convex extension) For a function f : [q] n —>• R, the convex extension f : (A y ) n —>• 
R is defined as, 

f{w) d = min E xeii [f(x)\, 

AtSA[ 9 ]n 

IH=Wi^ie[n] 

where the minimization is over all probability distributions p over [q] n whose marginals are given 
bywe ( A. q ) n . 

As the name suggests, / is a convex function whose minimum is equal to the minimum of /. 
In general, the convex extension of a function / cannot be computed efficiently since the optimal 
distribution p might not even have a succinct representation. In our case, however, we can prove: 


Theorem 4.5. Suppose a function f : [q] n —>• [q] admits a fractional polymorphism V such that: (1) 
Each operation p : —>• [q] in the support of V has a transitive symmetry, (2) the polymorphism 

V is measure preserving. Then there is an algorithm that given e > 0 and w € A”, runs in time 
poly(n, -) and computes f(w) Ml/lloo. 


Proof. Given w € A”, we first perturb every coordinate slightly to ensure that the minimum 
probability in each marginal Wi is bounded away from 0. In particular, we define w' by setting 


b = 


10 qn 


and 


w'(a) 


Wi(a) + b 
1 + qb 
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. Now let 


for all a E [g] and i E [n]. Clearly w' E A” and 'to- ^ 

M = argmin -B ieAt [/(x)], 
AiG A[,]n 
/ii =ii/Vi€[n] 


denote the optimal distribution over [g]" 1 that achieves the minimum in f(w'). Fix 6 = ^ and note 
that 

i og a g 4) = 08 ^ los(,w ~ 1) ' ) 

Now we claim that for r = Q log 3 (ne -1 )^ we have 

F(V® r (fi x )) - 5||/||oc < /V) = F^) < F(V® r (n x ) ■ 

The left-hand inequality follows from Lemma 4.3. The right hand side inequality follows because 
T ,0r (/U x ) has its marginals equal to w 1 since V is measure preserving and fi is the optimal distribution 
with marginals w'. Moreover, observe that the distribution V® r {w') can be sampled efficiently as 
described below. 


Input: w' E A™ 

Output: Sample from V® r (w') 

• Fix zo = w' 

• For i = 1 to r do, 

— Sample pEP and set z. L = p(zi- 1 ). 

• Sample x E [ q] n by sampling each coordinate independently from z r . 


In order to estimate /(it/), it is sufficient to sample V® r iw) independently O [Jzj times to 
compute F(V® r (fJ , x )) to accuracy within fll/lloo with high probability. Thus, we can estimate 
f{w') to accuracy (<5 + ||/||oo = f ||/||oo- 

Next let 7 r be the distribution that achieves the optimum for marginals w i.e. f(w) = F(n). By 
changing each marginal of n by at most b we obtain a distribution n' with marginals given by it/. 
By optimality of /i we know 

/O') = Fin) < F{Tr') = F(n) ± bqn\\f\\oo = f(w ) ± ^||/||oo 

By a symmetric argument we get that f(w) ^ /(it/) ± ^||/||oo- Thus, we conclude that | f{w) — 
f(w') | ^ ||/1|oo■ Therefore we can estimate f(w) to accuracy (-^j + |)||/||oo ^ e||/||oo- In summary, 
this yields an algorithm running in time O q (n 2 s~ 3 log 3 (n£ -1 )^ to estimate f(w) within an error of 
e||/||oo with high probability. □ 

Gradient-Free Minimization. In the previous section, we have demonstrated that the convex 
extension / : A™ —>• IR. can be computed efficiently. In order to complete the proof of our main 
theorem (Theorem 1.15), we will exhibit an algorithm to minimize the convex function f. The 
domain of the convex function /, namely A” is particularly simple. However, the convex function / 
is not necessarily smooth (differentiable). More importantly, we only have an evaluation oracle for 
the function / with no access to its gradients or subgradients. Gradient-free optimization has been 
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extensively studied (see [NES11, Spa97] and references therein) and there are numerous approaches 
that have been proposed in literature. At the outset, the idea is to estimate the gradient via one 
or more function evaluations in the neighborhood. In this work, we will appeal to the randomized 
algorithms by gaussian perturbations proposed by Nesterov [NES11]. The following result is a 
restatement of Theorem 5 in the work of Nesterov [NES11]. 

Theorem 4.6. Let f : R N —>• M be a non-smooth convex function given by an evaluation oracle O. 
Let us suppose f is L-Lipshitz,i.e., 

I/O) - /(y)K L \\ x - y\\ Vx,yeR N 

Let Q C M. N be a closed convex set given by a projection oracle ttq : R N —> Q. Let A (Q) = f 

suP^eQlO-yll- 

There is a randomized algorithm that given £ > 0, with constant probability finds y € Q such 
that, 

|/(y) - min/O) | ^ e, 
xeQ 

using 

(A + 4)L 2 A(Q) 2 
e 2 

calls to the oracle O and ttq. 

In our setup, the domain Q = A” € M niJ is a very simple closed convex set with diameter 
A(Q) ^ \/2n. The convex extension f is L-Lipschitz for L = ||/||oo- Finally, we turn to the proof 
of our main theorem, which follows more or less immediately from the results in this section. 

Proof of Theorem 1.15. By Theorem 4.5 we can estimate the convex function f(w) with sufficient 
accuracy to apply Theorem 4.6 to approximately minimize f(w) in time poly(l /e,n, ||/||oo)- Since 
f(w) is the convex extension of /, the output of the algorithm approximately minimizing f(w) will 
also approximately minimize /. □ 

5 Analytic Background 

In this section we introduce the analytic background necessary to prove Theorem 3.3. 

5.1 Analysis on Finite Measure Spaces 

We begin by introducing the space of functions that will be fundamental to our analysis. 

Definition 5.1. Let y be a probability distribution on [q]. Define L 2 ([q],y) to be the inner product 
space of all functions f : [g] —>• R with inner product given by 

if,g) = E [f(x)g(x)\ . 

Xr^fl 

For a function / € L 2 ([q],y) the L v - norm of / is defined as 

n/iip=jy/( x ) p ]^ • 

We will also often use the equivalent notation L 2 (0) where Ll = ([q],y) is the corresponding 
probability space. We next introduce a useful basis for representing functions / : [q] k M. 
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Multilinear Representation. Let p be a distribution on [</]. Pick an orthonormal basis ([> = 
{(/)o = l,..., 4> q - 1 } for the set of functions from [g] to M denoted by L 2 ([g], p). 

The tensor product of the orthonormal basis cp gives an orthonormal basis for functions in 
L 2 ([q] k , fi k ). Specifically, for a multi-index a € {0,1,..., q— l} fc , let <p a : [g] fc —> M denote the 
function, (p a {x ) = {I*=i (pa^Xi). It is easy to see that cp a forms an orthonormal basis for space of 
functions L 2 ([g] k , p k ). 

We can write any function p : [q] k —>• R in the <p a basis as 

P ( x ) = 51 Ma{x) 

a:E{0,l,...,g— l} k 

where we often refer to the p a as the Fourier coefficients of p. For a multi-index a € {0,1 ,,q 
let |a| = \{i\oti / 0}|. The degree of a term (p a (x) is given by |a|. 

Let p : [q] k —>• k q be a k -ary operation represented by g-real valued functions p = (p \,.. 
Associated with every k -ary operation p is a subspace of L 2 (p k ) given by 

Span(p) = { span of pi,... ,p q } 

Noise Operator. For a function p : [q] k —>■ M and p € [0,1], define T p p as 

T p p(x) = y E Jp(y)\ 

where y x denotes the following distribution, 

Xi with probability p 

indpendent sample from p with probability 1 — p 

The multilinear polynomial associated with Tpp is given by 

TpP = 

a 

5.2 The Conditional Expectation Operator 

In this section we introduce the conditional expectation operator associated with a joint probability 
distribution. The singular values of this operator encode information about correlation, and thus 
the singular vectors provide a useful basis in which to analyze correlation decay. 

Let p be a joint distribution on [gi] x [qr 2 ] with marginals p\ and p 2 - Further let Hi = ([qi], pi) 
and n 2 = ([ 92 ] 1 P 2 ) denote the probability spaces corresponding to p\ and p 2 - 

Definition 5.2. The conditional expectation operator T p : L 2 (TL 2 ) —>• L 2 (Q 1 ) is given by 

(TMx)= E [f(Y)\X = x] 

It follows from the definition that 

E [f(X)g(Y)\= E [f(X)(T p g)(X)} 

{X,Y)~fA A^/i. 1 

The adjoint operator T* : L 2 (£l 1 ) —> L 2 (fl 2 ) is given by 

(T; g )(y)= E \g(X)\Y = y] 

(A, Y 

It is possible to choose a singular value decomposition of the operator T p such that (p 0 = 

'i/’o = 1. Let ai denote the singular value corresponding to the pair (j>i,ipi. That is Tpi^ = a % <pi. It 
is easy to check the following facts regarding this singular value decomposition. 



• ,Pq)- 
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Fact 5.3. Let {fa,fa} be the above singular value decomposition of T, t . 

• The functions fa and fa form orthonormal bases for L 2 fali) and L 2 {p, 2 ) respectively. 

• (Jq = 1 and <j\ = p(p). 


The expectation over p of the product of any pair fa and fa is given by 


E [fa{x)fa(y)\ = 
[x,y)~u 


■ * = 3 
0 :i^j 


Thus, for functions F € L 2 ([qi] k , p\) and G € L 2 ([q 2 ] k , p k ) we can write their multilinear 
expansions with respect to tensor powers of the and bases respectively. In particular, for a 
multi-index a E {0,..., q\ — l} fc and x € [qi] k we have the basis functions 

<j>a(x) = W^a^Xi) 

i 


with if fay) defined similarly. We then have the following fact 

Fact 5.4. The vectors {<f a } a &{qi) k > {' l Pp}pe[q2] k are a singular value decomposition of T® k with 
corresponding singular values a a = JX ; v ai . Furthermore T^ = T® k . 

Using the above facts about the singular value basis it follows that 


E \(j) a (x)iffay)\ = 

(x,y)~u 


<7 a 

0 


: a = (3 


We can then write the multi-linear expansion of F with respect to f> as 

F(x) = Y J FMx) 


As an immediate consequence of the above discussion we have 
Fact 5.5. Let F € L 2 ([qi] k , p\) and G G L 2 ([q 2 ] k , p k ). Then 

E [F(x)G(y)]=J2F a G a a a 

(x,y)~y, k 


6 Correlation Decay for Symmetric Polymorphisms 

In this section we prove Theorem 3.3. The proof has two major components. First, we show that 
for an operation where the Fourier weight on certain degree-one coefficients is bounded away from 
one, correlation decreases. Second, we show that for operations admitting a transitive symmetry, 
this Fourier weight does indeed stay bounded away from one after each application of an operation. 

Throughout the section let p be a joint distribution on [q] X [q] m for some m. Let p\ and p 2 be 
the respective marginal distributions of p. Further, let Pi = ([q], pi) and P 2 = 2 ) denote 

the probability spaces corresponding to p\ and p 2 - It will be useful to think of q as being a constant 
and m as possibly being very large. 

Further, for an operation p : [q] k —>• [q] we will write p(p) for the probability distribution 
obtained by sampling (X,Y) ~ p k and outputting the pair (p(X\,... ,Xfa,p(Yi,. .. -Yfa) e [q] x 
[q] m . Recall that we have defined p(Y \,..., Yfa for Yj € [q] m to be the result of applying p on each 
coordinate of Y \,..., Y*, separately. 


18 


6.1 Linearity and Correlation Decay 

Given a singular value basis {4> a } for the space L 2 ([q] k , p k ) we call the set of functions (j) a such that 
|or| = 1 (i.e. exactly one coordinate of a is nonzero) the degree-one part of the basis. Note that 
there are qk such basis functions: q for each of the k coordinates. We will be interested in keeping 
track of those degree-one basis functions with singular values close to the correlation p(p). 

Definition 6.1. Let {(fra) be a singular value basis as above. The linear multi-indices a are given 
by the set 

= {a | |ct| = 1 and a a ^ p{p) 2 } 

The linear part of a singular value basis is those (f> a with a € . 

Next we define a quantitative measure of the linearity of a function. 

Definition 6.2. Let F € L 2 ([q] k , p k ). The linearity of F is defined as 

Lh V (F) = £ F 2 

CxdzLfj, 

Further, for an operation p : [q] k —>• [q] the linearity of p is given by 


Lin P (p) = sup Lin M (F) 

FeSpan(p) 

IIG| 2 =i 

With these definitions in hand we show that the correlation of p decays under any operation p 
that has linearity bounded away from one. 

Lemma 6.3. (Correlation Decay) Let p : [q] k —>• [q] be an operation. Then 

pip(p)) < p(p) - Lin M(p))( 1 “ P(P) 2 )J 

Proof. Let / : [q] -l M and g : [q] m -> 1 be such that E X ^k[f{p(x)] = 0, E^k \g(p(y))] = 0, 

II / ip) II2 = \\g(p)h = 1 and 

p(p(p))= E [f(p(x))g(p(y))\. 

( 

That is, / and g are the functions achieving the maximum correlation for p{p). Now define F : 
[g] fc —»• R as F(x) = f(p(x)). Similarly, define G : [q\ mk ->f as G(y) = g(p(y)). 

Writing the multilinear expansion of F with respect to the basis </>, we have F = F a (j) a 
where 

F 0 = E[F]=0, = ||F || 2 = 1. 

a 

Furthermore, since F £ Span(p) we have 

J2 F 2 = Lin p (F) Lin M (p). 

a£L M 


Similarly, one can write the multilinear expansion of G. Recall that, 


E [f(p(x))g(p(y))] 

[x,y)~n 


E [F{x)G{y)] = Y J F a G a a a 

[x,y)~F 
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Therefore, we may apply Cauchy-Schwartz inequality to obtain 

p{p{p)) = ^F a G a a a , 

OL 

V FV 

q/0 

where in the last step we have used both that Fq = 0 and Ya G 2 a = 1 - Let p = p(p) and note that 
for all a S Lj, we have cJq, ^ p. Further, for non-zero a ^ L p we have a a ^ p 2 . Thus, we can split 
the above sum to obtain: 

< (p 2 E f 2 + p 4 E 

^ p (Lin M (p) + (1 - Lin M (p))p 2 ) 2 

< P (l — (1 — Lin^O?))! 1 - P 2 )) 2 

< p ( x - \( l ~ Lin Mb))( 1 - p 2 )) ■ 

□ 





6.2 Hypercontractivity and the Berry-Esseen Theorem 

In light of Lemma 6.3, correlation will always decay under application of an operation p as long as 
Lin^(p) is bounded away from one. The main challenge for proving that correlation decays to zero 
under repeated application of polymorphisms is in controlling the linearity after each application. 
The intuition for our proof is as follows: If a polymorphism has linearity very close to one, then it is 
close to a sum of independent random variables, and so the Berry-Esseen Theorem applies to show 
that p(pi) is nearly Gaussian. However this should be impossible since p(pi) only takes q distinct 
values. The version of the Berry-Esseen Theorem that we will use can be found as Corollary 59 in 
Chapter 11 of [0’D14]. 

Theorem 6.4 (Berry-Esseen [0’D14]). Let X\, ..., X n and Y\ ,..., Y n be independent random vari¬ 
ables with matching first and second moments; i.e. E[X k ] = E[Y k ] for i E [n] and k E 1,2. Let 
S\ = Yi Xi and Sy = Y% T:- Then for any if : R —>• R which is c-Lipschitz 

| K[if(S x )} - nif(S Y )}\ < 0(c) • £ E[Xf] + E[Y?] 

i 

Note that since the error term in the Berry-Esseen Theorem depends on the L 3 - norm of the 
random variables, we must control the L 3 -norms of singular vectors of T p . Our main tool to this 
end will be hypercontractivity of the noise operator T p . We state here a special case of the general 
hypercontractivity theorem which will suffice for our purposes. 

Theorem 6.5 (Hypercontractivity [0’D14]). Let n be a probability distribution on [g] where each 
outcome has probability at least A. Let f E L 2 ([q] k , ir k ) for some k E N. Then for any 0 ^ p ^ 

IMIs< ll/lb 

where T p is the noise operator. 
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Our first task will be to relate L 3 norms under the conditional expectation operator Tf to those 
under the noise operator T p . However, these operators do not map between the same spaces. To 
fix this, we instead consider the operator = T p T* so that M : L 2 (fl 1 ) —>• L 2 (0 1 ). The following 
simple lemma gives a hypercontractivity result for . 

Lemma 6.6. Let M p = TfaT* and let A be the minimum probability of an atom in the marginal p\ 
of p. Let s be a number such that p(p) 2s ^ -^A6. For any k E N, if let f E L 2 ([q] k ,p k ) then 

\\M s pk fh^\\fh 

Proof. Observe first that M fJ is a self-adjoint linear operator and let {cr 2 |i E [g]} be the eigenvalues 
and let {fa\i E [(/]} denote the eigenfunctions.. Further M^ k = M® k is a self-adjoint operator with 
eigenvalues {cr 2 \a E [g] fc } and eigenfunctions {cf a \a £ [g] fc }. 

Let p = p{p) 2s and define 

Va ~ p H 

Observe that po = 1. Next, because aj ^ p(p) for all j fa 0 and erg = 1 we have 


Pa = 




-)\ a \ 


iM 


= 1 


for all a^O. Now define an operator A by setting Afa a = p a (j) a for each a E [q] fc . Since {4>a} a &[q] k 
form a basis, this uniquely defines A. In particular A is a self-adjoint operator with eigenvalues 
p a fa 1 and corresponding eigenfunctions </> Q for a E [q\ k . Note that by construction 


TpAcfa = al s (f a = M^ k (j) a , 

That is, T p A agrees with M^ k on the entire 4> a basis, and so by linearity T p A = M s k . Now we 
compute 

\\M; k f\\ 3 = \\T p Af\\ 3 fa\\Af\\ 2 , 

where the hnal inequality follows by applying Theorem 6.5 to the function Af. Next since every 
eigenvalue of A is at most 1 we have 

\\Af\\ 2 fa\\f\\ 2 . 

Plugging this into the previous inequality completes the proof. □ 

We now apply the hypercontractivity theorem in order to control the L 3 -norms of singular 
vectors of -^pO) for some operation p. The following lemma gives a trade-off between the L 3 -norm 
of a singular vector of T p ( p ) and the magnitude of the corresponding singular value. Further, 
the trade-off depends only on properties of the initial distribution p. This in turn implies that 
singular vectors with high L 3 -norm have low singular values, and thus cannot contribute much to 
the correlation p(p(p)). 


Lemma 6.7. Let A be the minimum probability of an atom in the marginal p\ of p. Let s be a 
number such that p(p) 2s fa ^jA 6 . Let p : [q] k —> [q] and fa E L 2 {Ll\) be a singular vector of T p ( p ) 
with singular value Oi for i fa 0. Then 



fa 1 


Proof. For any distribution vr we will use the notation M w = T n Tf. Since fa is a singular vector of 
with singular value a t we have 



\\AI 


pin)' 


3 ■ 
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In terms of the dual norm, we can write 


\\ M p{n)4>ih = SU P E [g(z)Mp { )(j)i(z)\ 

llffll 3 =1 z ~p(w) 

1 

= sup e [g{p{ x ))M s k 4>i(p(x))\ 

\\g\\ 3 =lx~ri 

sup \\g(p)\\3 \\M* k (<pi op)\\ 3 
llffll 3 =1 2 

= ||M* fc (&op )|| 3 


where in the second to last line we have applied Holders inequality in the space L 2 ([q] k , fi k ). Now 
note that by Lemma 6.6 , 

\\ M ^{4>i op)\\ 3 \\(j)i o P \\ 2 = l. 

Combining the above inequalities completes the proof. □ 

Next we will need the following technical lemma regarding discrete distributions taking only 
few values (e.g. functions F E Span(p) for some polymorphism p). Informally, the lemma states 
that such distributions cannot be close to sums of many i.i.d. random variables. 

Lemma 6.8. Let F be a real-valued random variable taking only q values. Let X t for i E [m] be 
i.i.d. real-valued random variables with E[JQ] = 0, Var[Xj ] £ and E[|Xj| 3 ] ^ B. Let S = X t . 

Then for some absolute constant c > 0, 

E[|F-5|] > ^ -cBm. 

Proof. Let be the set of real values that F takes. Now define the function 

d{z) = min \z — o,| 

3 

which simply measures the distance from z to the nearest aj. Let v = Var [X,;] and let Yj be 
a Gaussian random variable with first two moments matching those of X t . Note that fTj Yj is a 
Gaussian with mean zero and variance v. Further observe that d is 1-Lipschitz and so we may 
apply Theorem 6.4 to obtain 

| E [d(z)\ — E [d(5)] | ^ cBm 

z~A/’(0,u) 

for some absolute constant c. 

Next observe that the set of intervals such that d(z) ^ 5 has total length 2 q5. Thus, the normal 
distribution JV(0, v ) has probability mass at most 2 q5 on the region where d(z) ^ 6. This implies 
that 

E \d(z)}^6(l-2qS) 

z~J\(0,v) 

Combining this with the previous inequality yields 

E[d(5)] ^ <5(1 — 2 qd) — cBm 

Now let H = F — S. Since F only takes the values aj we have 

E [d(S)} = E[d(F - H)} 

= E[min | F — H — aj\] 
j 

<E[|aj -H-aj\] =E[\H\] 
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Thus we conclude that 


E[\H\] ^ 5(1 - 2 q5) - cBm 

Setting 5 = ^ yields the desired result. □ 

6.3 Correlation Decay 

Throughout the remainder of this section, whenever we refer to a joint probability distribution v 
on [(/] x [q] m we will fix a singular value basis {cj) a ,ip a } for the operator T® k . Let v\ and V 2 be 
the respective marginals of u. Whenever we have a function F E L 2 {[q] k , v k ) we will write F a to 
denote the Fourier coefficients of F with respect to the basis cj) a . 

Now we are ready to show that Lin ^(p) stays bounded away from one for operations p that are 
transitive symmetric. First, we need the following simple claim which asserts that corresponding 
degree one Fourier coefficients of p are all equal. 

Claim 6.9. Let p : [q] k —>• [q] be a transitive symmetric operation and let F E Span(p). Let v\ be 
any distribution on [g] and fix a Fourier basis <f a for L 2 ([q] k . u k ). Let a,fi be multi-indices with 
|a| = \fi\ = 1- Further suppose a, = ffi for the unique pair i and j such that at and ffi are non-zero. 
Then F a = Fg. 

Proof. Let it be a permutation such that n(j) = i and p(x) = p(ir o x). Such a permutation always 
exists since p is transitive symmetric. Now note 

F a = E [F{x)(j) ai {xi)] = E[F(vr o x)fi ai {xi)\ = E [F{x)^^{xj)\ = Fp 


□ 

We turn now to the proof of the theorem. The main idea of the proof is to repeatedly apply 
Lemma 6.3 to reduce correlation while using Lemma 6.7 in conjunction with Lemma 6.8 to control 
the linearity in each step. 

Theorem 6.10. Let p\.. .p r be transitive symmetric polymorphisms with each pi : [q] k —> [(/]. Let 
A be the minimum probability of an atom in the marginal p.\ of p. Then for any ^ > e > 0 and 

r = J Vk5^y lo g 2 ( £_1 )) 

p(pi ®P2 < 8 > ■ ■ ■ ®Pr(p)) < £ 

Proof. For the analysis we will break p\ .. .p r into consecutive segments of length a, where a is a 
parameter that we will set later. Formally, let IL = k a and let Pt : [q] K —>• [q] be defined as 

P t (x) = pat+1 ® ■ ■ ■ <8> Pat(x ) 

Let = p and p^ = P\ ® ® Pt-i(p). Observe that each Pt is again a transitive symmetric 

polymorphism. Next we control Lin^t) (P t ) only in terms of p(p^) and properties of the initial 
distribution p. 

Let F E Span(P 4 ) so that E ( t )[F] = 0 and ||-F || 2 = 1. Let {fi a , ip a } be a singular value basis 
for T®£ and let F a be the Fourier coefficients of F : [q\ K —$■ M with respect to the f> a basis. Now 
define the function : [q] —>• M to be 

k{x) = F a (f ai (x) 

aif 0 
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In words, each is simply the linear part of F corresponding to the ztli coordinate. Claim 6.9 
implies that for any pair i and j the sums for l t and lj are equal term-by-term. Thus, lj(x) = lj{x) 
for all i,j. Now note that 

F{x) = Y^h( x i) + H(x) 

i 

where H(x) = Y^adL F a d a (x) is the non-linear part of F. Since ||T ||2 = 1 and the U are all 
equal and orthogonal, it follows that ||Zj||| ^ j? for all i. This further implies that for any a £ L (t) 
the coefficient F a ^ -^=. 

Thus, the h(xi) are i.i.d random variables with mean zero and variance at most Further 
note that 




D oti 3 


(*) 




Let s = fl j. Now we apply Lemma 6.7 with p = Pi ® ■ ■ ■ (g) P t -1 to obtain 


i*2 


(p 


CH^ o 


rF7 5Z 

VK v> 

ai^O 




where in the last inequality we have use the fact that <r a ^ p(p^ l> ) for a^O, and that the sum has 
q terms. Therefore, since F(x) is a random variable taking only [g] different values we may apply 
Lemma 6.8 to obtain 

n\H(x)\] > i_ _ c|| k\\lK > 1 - cq 3 K-lp(pU)- 6s . 

Note that setting a = 4log(cq 3 e~ 6s ) yields 

K ~5 = k~^ a ^ ( cq 3 e ~ 6s )~ 2 . 

Thus as long as p(p^) ^ e we have 

e[\h(x)\] ^ ^ 

where the last inequality follows from e < So, letting 5 = E[iL(x) 2 ] ^ E[|iL(x)|] 2 = 10 p o we 
obtain Lin^ w(Pt) ^1-5. 

Now we apply Lemma 6.3 to Pt to obtain 

Pp ,+1) ) « Pp ,] ) (l - 1(1 - Lm(M«‘>))(l - p(p, m ) 2 )) 

^p{p l,, )(l-^{l-p(p i ‘ > ) 2 )) 

= (l - f) p(P) + | p(PP 

Solving this recurrence shows that p(p ^ e after t = 0,j(log(e _1 )) steps. Since each step 
corresponds to a applications of polymorphisms, we get that the total number of applications 
required is 

r = Q q (a log(e -1 )) = n q (s log 2 (e -1 )). 

□ 
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We now use the connection given in Lemma 3.2 between correlation and statistical distance to 
prove Theorem 3.3. 

Proof of Theorem 3.3. Let P = p\ <g> P 2 < 8 > ■ • • < 8 > p r - The proof that || P(n) — P(g x )\\i ^ r] is by a 
hybrid argument. We define intermediate distributions 7r® by drawing a sample X from g, and 
then independently re-sampling each of the coordinates j > i from their respective marginals f.ij. 
Let Yj be the random variable corresponding to an independent sample from the marginal gj. We 
will use the notation 7= {X \,..., X , L1 ,..., Y n ). 

Note that since the Yj are independent of each other and of the Xi we have 

||P( 7r h-i))_P( vr W)|| 1 = \\(p(X 1 ),...,P(X i - 1 ),P(Y i j) - (p(X 1 ),...,P(X i ))|| 1 

Let e = 2^. By Theorem 6.10 we have p^P(Xi), (P(Xi ),... , P(Xj_i))^ ^ e. Further, since Yi is 
independent of X±,..., W_i we may apply Lemma 3.2 to obtain 

||(p(X 1 ),...,P(X i _ 1 ),P(y)) - (p(X 1 ),...,P(X i j)\\ 1 ^qe 
Now since 7= g x and ir^ = g we have by the triangle inequality 


n— 1 


II P(t) - ^(^ x )lli ^ Ell^ w ) - ^ (m) )lli < me = g 


i =0 


□ 


7 Approximate Polymorphisms for CSPs 

7.1 Background 

Constraint Satisfaction Problems. Fix an alphabet [q]. A CSP A over [q] is given by a family 
of payoff functions A = {c : [r/]—>• [—1,1]}. An instance of Max-A consists of a set of variables V = 
{X\ ,..., X n } and a set of constraints C = {C\, ..., C m } where each C'i(X) = cfX ^, Xj 2 ,..., Xi k ) 
for some c € A. The set C is equipped with a probability distribution w : C —>• M + . 

The goal is to find an assignment x € [q] v that maximizes 

valcj(x) = f E w{C)C(x ). 

CeC 

If the probability distribution w is clear from the context, we will write 

valsO) = E [C(s)] . 

Remark 7.1. A natural special case of the above definition is the set of unweighted constraint 
satisfaction problems wherein the weight function w : C —>• M+ is uniform, i.e., 

valoO) = 4, E C '( x ) • 

CeC 

In terms of approximability, the unweighted case is no easier than the weighted version since for 
every constant g > 0, there is a polynomial time reduction from weighted version to its unweighted 
counterpart with a loss of at most g in the approximation. 
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Dictatorship tests. 

Definition 7.2. Fix constants —1 ^ s ^ c ^ 1, an integer R € N and a family of functions 
F = {A : [g] R —»• [g]}. A (c, s )-dictatorship test for Max-A against the family F consists of an 
instance 9 of Max-A with the following properties: The variables ofQ are identified with [q] R and 
therefore an assignment to 9 is given by p : [g] fl —> [q]. 

• (completeness) For every i € [R], the i th dictator assignment pi : [q] R —>• [q] given by Pi(x) = 

satisfies Q(pi) ^ c. 

• (soundness) For every function p £ F, we have Q(p) ^ s 

The intimate relation between polymorphisms and dictatorship testing was observed in [KS09], 
here we spell out the connection to approximate polymorphisms. First, for a function family 
F = {p : [q] R [g]}, we will say an approximate polymorphism V is supported on F if probability 
distribution V is supported only on functions within the family F. 

Theorem 7.3. Given a CSP A, a natural number R £ N and a finite family of functions F = {p : 
[q] R —»• [g]} closed under permutation of inputs, the following are equivalent: 

• Max-A does not admit a {c, s)-approximate polymorphism supported on F. 

• There exists a (c, s )-dictatorship test for Max-A against the family F. 

Proof. Suppose Max-A does not admit (c, s)-approximate polymorphism supported in F. In other 
words, for every probability distribution V over F , there exists an instance 9 and R solutions 
X^\ ..., X of average value c such that the value of E pe -p ^(^(A^ 1 ),..., 1^)) < s 

Notice that the set of probability distributions V over F forms a convex set. Furthermore, the 
set of all instances 9 of Max-A along with R assignments A’W,... ,X^ with average value at 
least c, form a convex set. Specifically, given two instances and any 6 £ [0,1], one can 

construct the instance + (1 — 2 by taking a disjoint union of the two instances weighted 

appropriately. Since the set of all probability distributions V over F is a compact convex set, by 
min-max theorem there exists a single instance 9 and a set of R solutions for it, that serve as a 
counterexample against every probability distribution V over the family F. 

We will use the instance 9 and the set of solutions A"! 1 ),..., X^ to create the following 
dictatorship test. 

• Sample a random constraint C from Let C be a constraint on variables ,..., Vi k in $L 

• Pick a random permutation ir : [i?] —)• [R] and set yj = (X-J ^ 1 ' 1 ,... ,xj\^ R ^) for j € [R], 

• Test the constraint C on p(yi 1 ), ■ ■ ■ ,p(yi k ). 

Suppose p(y) = y^ for some i £ [R]. In this case, it is easy to see that over the random choice 
of constraint C and permutation 7 r, the expected probability of success of the dictatorship test is 
exactly equal to the average value of the solutions A^,..., X^ R \ 

Let p : —>• [q] denote any function in the family F. If probability of success of p is greater 

than s then there exist a permutation 7r : [R] —>• [R] for which, 

g(p(a^W),...,a:^^))) ^ a . 

This is a contradiction, since p o it would also be a function in the family F. Therefore, for every 
function p £ F, its value on the dictatorship test is at most s. This completes the proof of one 
direction of the implication. 
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Now, we will show the other direction, namely that if there exists a (c, s)-dictatorship test then 
there is no (c, s)-approximate polymorphism. Conversly, suppose there is a (c, s)-dictatorship test 
against the family T. Consider the instance O' given by the dictatorship test and the family of R 
assignments corresponding to the dictator functions, i.e., pt(x ) = x®. By the completeness of the 
dictatorship test, for each of the dictator assignments pi , we will have $s(pi) ^ c. On the other 
hand, by the soundness of the dictatorship test, for each function p in the family J c , Q(p) ^ s. This 
implies that there does not exist a (c, ^-approximate polymorphism supported on the family T. 

□ 

Unique Games Conjecture. For the sake of completeness, we recall the unique games conjec¬ 
ture here. 

Definition 7.4. An instance of Unique Games represented as T = (AuB,E, II, [R]), consists of a 
bipartite graph over node sets A.B with the edges E between them. Also part of the instance is a 
set of labels [R] = {1,..., i?} ; a a set of permutations 7r a b : [R] —>• [R] for each edge e = (a, b ) € E. 
An assignment A of labels to vertices is said to satisfy an edge e = (a, b), if 7r a b(A(a)) = A(6). The 
objective is to find an assignment A that satisfies the maximum number of edges. 

The unique games conjecture of Khot [Kho02] asserts that the unique games CSP is hard to 
approximate in the following sense. 

Conjecture 7.5. (Unique Games Conjecture [Kho02]) For all constants 5 > 0, there exists large 
enough constant R such that given a bipartite unique games instance T = (AuB, E, II = {ir e : [72] —>• 
[R] : e£ E}, [72]) with number of labels R, it is NP-hard to distinguish between the following two 
cases: 

• (1 — <5)-satisfiable instances: There exists an assignment A of labels that satisfies a (1 — 5)- 
fractional of all edges. 

• Instances that are not A-satisfiable: No assignment satisfies more than a 5-fraction of the 
constraints II. 

7.2 Quasirandom functions 

The function family of interest in this work are those with no influential coordinates. To make the 
definition precise, we begin by recalling a few analytic notions. 

Low Degree Influences. Fix a probability distribution p over [g], Let {xo> • • • > Xq-i} be an 
orthonormal basis for the vector space ^([g], p). Without loss of generality, we can fix a basis such 
that xo = 1- Given a function / : [q] k —>• R, we can write / as, 

./ 1 fcrXcr ! 

where xA x ) = 11 /= l X.n :i ( x j ) • Define the degree d influence of the i th coordinate of / under the 
probability distribution p as, 

i <(/)= E fl- 

crEN fc ,0-^0, |<t| <d 

More generally, for a vector valued function / : [q] k —>• R' 0 , we will set 

!<"(/)" E I 

j€|D] 

A useful property of low-degree influences is that their sum is bounded as expounded in the following 
lemma. 
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Lemma 7.6. For every function f : [q\ k —>• M and every probability distribution p over [q\, 

^Infg(/)<d-yar(/), 

<e[fc] M 


where Var /t (/) denotes the variance of f under the probability distribution p. 

Given an operation p : [g] A: —»• [g], we will associate a function p : [q] k —> k q given by p(x) = e p / x \ 
where in e p ( x \ G k q is the basis vector along p(x) th coordinate. We will abuse notation and use p 
to denote both the [g]-valued function p and the corresponding real-vector valued function p. For 
example, hii iqi (p) will denote the influence of the i th coordinate on p. 


Definition 7.7. An approximate polymorphism V is (r, d)-quasirandom if for every probability 
distribution p G k q , 


E 

p&V 


max Inf i tl {p) 
i 


^ T 


The following lemma shows that transitive symmetries imply quasirandomness. 


Lemma 7.8. An operation p : [q] k —>• [g] that has a transitive symmetry is (^, d)-quasirandom for 
all d G N. 


Proof. The lemma is a consequence of two facts, first the sum of degree d influences of a function 
p : [q] k —> [g] is always at most qd. Second, if the operation p admits a transitive symmetry, then 
the degree d influences of each coordinate is the same. □ 


Noise Operator. For a function p : [q] k —> K and p G [0,1], define T p p as 

T P P(x ) = E Jp(y)] 

where y x denotes the following distribution, 


Vi = 


Xi 

independent sample from p 


with probability p 
with probability 1 — p. 


The multilinear polynomial associated with T p p is given by 

TpP = ^PaP^Xa- 

<7 


Approximation Thresholds. For each r > 0 and d G N, let F T) d denote the family of all 
(r, (i)-quasirandom functions. 

Definition 7.9. Given a CSP A, and a constant c G [—1,1] define 


s a(c) c = sup{s|Vr > 0,d G N, 3 a 


(r, d)-quasirandom 


(c, s)-approximate polymorphism for Max-A} 


Moreover, let 


def . „ 

a A = lilt 

cP0 


S\{c) 

c 


The following observations are immediate consequences of the definition of s\. 


Observation 7.10. The map s\ : [—1,1] —>• [—1,1] is monotonically increasing and s\fc + e) ^ 
s a(c) + e for every c, e such that c, c + £ G (—1,1). 
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7.3 Rounding Semidefinite Programs via Polymorphisms 

In this section, we will restate the soundness analysis of dictatorship tests in [Rag08] in terms of 
approximate polymorphisms. Specifically, we will construct a rounding scheme for the BasicSDP 
relaxation using approximate polymorphisms, and thereby prove Theorem 1.12. 

Basic SDP relaxation. Given a A-CSP instance 9 = (V,C), the goal of BasicSDP relaxation is 
to find a collection of vectors {bjj, a }z;eV,aeM i n a sufficiently high dimensional space and a collection 
{^cjcesuppic) distributions over local assignments. For each payoff C E C, the distribution //c 
is a distribution over corresponding to assignments for the variables V(C). We will write 

P x£flc {E} to denote the probability of an event E with under the distribution fie- 


BasicSDP Relaxation 

maximize E E C(x) ( Basic SDP) 

subject to (b v i, b v t ,) = P \ x v = i, x v > = j\ 

X~flc t J 

(C G supp(C), v, v' € V(C'), i,je[q]). 

_ Me € i([g] V(C) ) _ VC e supp(d) _ 

We claim that the above optimization problem can be solved in polynomial time. To show this 
claim, let us introduce additional real-valued variables fj,c,x for C € supp(C) and a; € [ q ]' ^ c \ We 
add the constraints /z c,x ^ 0 and Sxe[g] v ’(°) bC,x = 1- We can now make the following substitutions 
to eliminate the distributions nc , 


E C(x) 

X~[iC 


C(x)nc,x , 

xe[q] v C c ) 


• J HC 


I Xi = a | 


P l Xj = a, Xj = 

X~fl C 1 3 


h } 


bc,x , 

xE.[m] v ( c ^ 

Xi=a 


bc,x ■ 


i£[m] v ( c 5 

Xi=a,Xj=b 


After substituting the distributions fic by the scalar variables nc,xi it is clear that an optimal 
solution to the relaxation of C can be computed in time poly(m fc , | supp(C)|). In the rest of the 
section, we will show how to obtain an rounding scheme for the above SDP using an approximate 
polymorphism V. 

The overall idea behind the rounding scheme is as follows. Let ( V , fi) be a feasible solution to 
the BasicSDP where V consists of the vectors and /z consists of the associated local distributions. 

Let V be an (c, s)-approximate polymorphism. If the polymorphism V is given as input integral 
assignments whose average value is equal to c, it would output an integral assignment of expected 
value s. This would be a rounding for BasicSDP relaxation certifying that the on instances with 
SDP value at least c, the optimum is at least s. However, in general, we do not have access to 
integral solutions of value c and there might not exist any. The idea is to give as input to V 
real-valued assignments such that they have value c, and the polymorphism V cannot distinguish 
these real valued assignments from integral assignments. Specifically, these real valued assignments 
are gaussian random variables obtained by taking random projections of the SDP vectors. 

The following is the formal description of the rounding procedure Round-p. 
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Roundp Scheme 


Input: A A-CSP instance 9 = (V,C) with rn variables and an SDP solution {b Vj i},{pc} 
with value at least c. A (r, d) -quasirandom (c — g. s )-approximate polymorphism V. 

Truncation Function Let f\ : -A k q be a Lipschitz-continous function such that for all 

x € *<?, ,/a(x) = X. 

Rounding Scheme: Fix e = rj/lOk where k is arity of A. 

• Sample R vectors ..., ^ R ) with each coordinate being i.i.d normal random variable. 

• Sample an operation p ~ V. 

For each v G V do 

• For all R and c € [q], compute the projection g^l of the vector b v c as follows: 

9$ = (I + [((bv,c - ((I,M)I)>C (j) )' 

• Let H denote the multilinear polynomial corresponding to the function T\_ e p : k q -A k q . 

(i) 

Evaluate the function H = Tj_ e p with g\,f as inputs. In other words, compute z v = 
(z v> i,..., as follows: 

z v = H( gv) 

• Round z t , to z* G A g by using a Lipschitz-continous truncation function /a : M 9 —5- A g , i.e., 

< = /aK)- 

• Independently for each we V, assign a value j G [q] sampled from the distribution z* € k q . 


Now we will analyze the performance of the above rounding scheme. First, we observe the 
following fact about approximate polymorphisms. 


Lemma 7.11. Fix an instance 9 of Max-A and a distribution 0 over assignments to 9 with 
Ex~e[A(X)] ^ c. Suppose V is a (c — g, s)-approximate polymorphism for Max-A for some g > 0, 
then we have 


E E 
pGV xW,...,X<. R )~0 




> s. 


where X^ l \ ..., X^ are sampled independently from 0. 


Proof. Let N G N be a. large positive integer. Let W be an instance consisting of N disjoint copies 
of 9. Define a distribution Q' of assignments to W consisting of N-i.i.d samples from 0. While the 
distribution 0 only satisfies an average bound on objective value Ex~©[^(AT)] ^ c, the distribution 
0' will satisfy 

P \&(Y) ^c-g 1 > 1 - e~ 0(riN) . 

Yr^S' 

Hence, if we sample ..., Y ^ ~ 0 7 then the average value of the assignments is at least c — g 


30 




with probability 1 — e °(v N ) _ We are ready to finish the argument as shown below, 


E E 
pev xW,...,x( R )~e 1 




$(p(yW,...,yW)) 


= E E 

pev l 

( linearity of expectation 


^ E E 
p&v L 

> s - e-Ofotf) . 


9'(p(y (1) ,...,y (R) )) s'(y^) > c- g 


Taking limits as IV —>• oo, the conclusion is immediate. □ 

To analyze the performance of the rounding scheme, we define a set of ensembles of local integral 
random variables, and global gaussian ensembles as follows. 

Definition 7.12. For every payoff C € C of size at most k, the local distribution gc is a distribution 
over [q ] v ^'" 1 . j n other words, the distribution gc is a distribution over assignments to the CSP 
variables in set V(C). The corresponding local integral ensemble is a set of random variables 
Cc = {^vi, ■ ■ ■ Av k } each taking values in k q . 

Definition 7.13. The global ensemble Q = {g v \v € V, j € [q]} are generated by setting g v = 
{9v, l, • • -,9v,q} where 

9v,j = (I, b vJ ) + ((b vJ - ((I, b vJ ))I),C) 
and ( is a normal Gaussian random vector of appropriate dimension. 

ft is easy to see that the local and global integral ensembles have matching moments up to 
degree two. 

Observation 7.14. For any set C € C, the global ensemble Q matches the following moments of 
the local integral ensemble Cc 

E \9v,j] = E[4j] = (I, M E [g 2 vJ \ = E [l 2 vJ \ = (I, b vJ ) 

E \9v,j9v,f] = E [tv,jZ v ,j>] = 0 Vj 7 ^ j',v € V(C) 

We will appeal to the invariance principle of Mossel et al. [MOO05], Mossel and Issakson [1M09] 
to argue that a (r, d)-quasirandom polymorphism cannot distinguish between two distributions that 
have same first two moments. 

Theorem 7.15. (Invariance Principle [IM09J) Let Ll be a finite probability space with the least non¬ 
zero probability of an atom at least a ^ 1/2. Let £ = ■ ■ ■ ,£ m } be an ensemble of random 

variables over fl. Let Q = {g \,..., g m } be an ensemble of Gaussian random variables satisfying the 
following conditions: 


E[4]=E [ gi ] E[£ 2 ] = E[g 2 ] E[£ i l j ]=E[g l9j ] Vi,j € [m] 

Let K = log(l/a). Let F = (Fi,... , Fd) denote a vector valued multilinear polynomial and let 
Hi = (Ti_ e Fj) and H = (Hi,. .. ,Hjj). Further let Inf*(IT) ^ r and Var[Fj] ^ 1 for all i. 

If T : M. d —>• R is a Lipschitz-continous function with Lipschitz constant Cq (with respect to the 
L 2 norm). Then, 


E 


T( H (£ r )) 


-E 


^(H(g R ))} ^c D -c 0 - T e ' im = Or( 1 ) 


for some constant Cd depending on D. 


e -0( V N) 
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The local integral ensemble has an expected payoff equal to c. The local and global ensembles 
agree on the first two moments. The (c, s ) -approximate polymorphism outputs a solution of value 
at least s when given the local integral ensemble as inputs. Hence, by invariance principle when 
given these global ensemble of random projections as input to the polymorphism V , it will end up 
outputting close to integral solutions of value -s. 

Theorem 7.16. For all r] and CSP A, there exists r, d > 0 such that the following holds: Suppose 
V is a (c — rj, s)-approximate (r, d)-quasirandom polymorphism for Max-A. Given an BasicSDP 
solution with objective value at least c, the expected value of the assignment produced by Round-p 
algorithm is at least s — g. 

Proof. Let Roundp(V,/r) denote the expected payoff of the ordering assignment by the rounding 
scheme Roundp on the SDP solution (V , p.) for the A-CSP instance 9. Notice that the g, are nothing 
but samples of the global ensemble Q associated with 9. For each constraint CgC, let Qc denote 
the subset of random variables from global gaussian ensemble that correspond to variables in V(C). 
It will be useful to extend each payoff C : [q] k —)■ [—1,1] to the domain of fractional/distributional 
assignments k q . Specifically, for z\,... ,z k € A g , define C(z\, ..., z k ) F. Xi ^ Zi [C(xi, ..., x &)] which 
corresponds to the expected payoff on fixing each input Xi independently from the corresponding 
distribution z\ € k q . By definition, the expected payoff is given by 


Roundp(F,/i) = E [ C («i,...,Ufc)] 

O EC 


= E E E 


P&VC&CgR L 


c(/ A (ff(4)),..,/ A (H(g5)); 

Fix a payoff CgC. Let To : M. qk -4lbea Lipschitz continous function defined as follows: 
Tc(zi,z 2 ,--- ,z k ) = C'(^/ A (zi),...,/ A (z fc )) Vzi, ...z k 


(7.1) 

(7.2) 


Hence, we can write 


Roundp(V, p) 


E E E 

P&VC&CgR L 


« 'c( 


H 


K&Vl ) 


,H( g: 


(7.3) 


For a constraint C, there are ^-different marginal distributions in pc- Note that since V is (r, d)- 
quasirandom, with probability at least 1 — k^fr over the choice of p ~ V, we will have 

maxlnfg.(p) < \fr , 


for each of the marginals gj within pc- Since the objective value is always between [—1,1], we get 
that 


Round v (V,n)> E E E |"'Fc (iT(g^), • • •, H(g? )) | max Inf<f/ (p) ^ y/r 
P&VC&CgR L v vot,1/ X Vk, Pi G \n :i.,-pffci 


■ie[R],je[k] ^ 

Fix d = (20/e) log(l/r). For every such p, the polynomial H = T\_ e p, we can conclude 


- k\fr 
that 


max Inf'j.j diT) ^ max Inf/'f (Tj_ £ p) + (1 — e) d 
ie[R},je[k] iemje[k] 1 ' 

< . max Infg (p) + (1 - e) d 

i£[R],]£{k\ 

^Vr + (l-e) d ^ 2y/r. 
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Applying the invariance principle (Theorem 7.15) with the ensembles Cc , Qc-, Lipschitz continous 
functional T and the vector of kq multilinear polynomials given by (H, H ,..., H) where H = 
(H \,..., H q ), we get the required result: 


RouncWV, u) ^ E E E 

p&v cec 



Ot( 1) 


where o T (l) is a function that tends to zero as t —>• 0. Since the local ensembles Cc correspond 
to the local distributions pc over partial assignments and H = T\_ e p, we can rewrite the 

expression as, 


E E E 

p&VC&C 



EE E 
p&v cec (x 1 ,...,x k )~p.£ 


Vc^piX^,,. 


,Ti_ £ p(X k )j . 

(7.4) 


By definition of Tc, \Hc(zi,..., z k ) = C(zi,, z k ) if Vi, Zj € A q . Summarizing our calculation so 
far, we have 


Roundp(14, fi) 




EE E 
pev cec (X\,...,x k )~p^, 


^ c (?i_ £ p(X 1 ) 


5 5 * 


..,Ti_ e p(X fc )) 


O t (1)■ 


The proof is complete with the following claim which we will prove in the rest of the section. 

Claim 7.17. 

* c (T 1 _ £ p(X 1 ),,...,T 1 _ E p(X k ))' 


EE E 

P&VC&C (x u ...,X k )~p* 


> S 


For each fixed p and e = p/k, the error term n(r,d) —> 0 as d —>• oo and t 
sufficiently large d and sufficiently small r, the error will be smaller than p. 


0. Therefore for a 

□ 


Proof. (Proof of Claim 7.17) For X £ [g] R , let Z ~i_ e X denote the random variable distributed 
over [q] R obtained by resampling each coordinate of X from its underlying distribution with proba¬ 
bility e. Notice that T\^ £ p(Xi) £ k q is a fractional assignment. To sample a value y £ [q] from the 
T\_ £ p(Xj), we could instead sample Zi ~i_ e X, : and compute p(Zi). In other words, y ~ T\_ £ p(Xi) 
is the same as y = p(Zf), Zi ~i_ e X,;. Using the way we defined the payoff function C over fractional 
assignments, we get 


E E E [c(Ti_ eP (Xi), ,..., Ti_ e p(X k ) 


= E E E 
per c&c 


E 

Zi~l-eXi - 


c{p(Z i),... ,p(Z k fj 


(7.5) 


To lower bound the expression, construct an instance Q? 7 that consists of the constraints in 9 but 
each over a disjoint set of variables. Consider the distribution 0 for assignments of 7y 7 wherein the 
variables corresponding to each constraint C £ C are sampled independently from pc- Let be 
the (1 — e)-noisy version of 0 in that it is obtained by sampling from 0 and rerandomizing each 
coordinate with probability e. 

Notice that 


e [s'cni 

y~© /L 


EE E 


[C(x i, — , aj fc )] 


^ E E \C(xi ,..., x k )] — ke 

C£Cxi,~,Xk~PC V 7 


^ c — ke 
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where in the last inequality we used the fact that the objective value of the SDP is at least c. Since 
V is a (c — p, s)-approximate polymorphism and p > ke, we can appeal to Lemma 7.11 to conclude 
that, 


pePy(i),...,y(H)~e' v v 


EE E 
percec (x^Xk^n* 


E 

-'l-eXi - 


c(p{Zi),... ,p(Z k fj 


(7.6) 


The claim is immediate from (7.5) and (7.6). □ 

Proof. (Proof of Theorem 1.12) By definition of s\, there exists an (c, SA( c ))-approximate polymor¬ 
phism with degree d influence less than t for all d, r. Fix any particular instance of Max-A. By 
applying the rounding scheme Round-p on a sequence of polymorphisms with influence r —>• 0 and 
p —>• 0, we get that the SDP integrality gap on 9 is at most lim^o sa(c — rf). □ 


7.4 Approximate Polymorphisms and Dictatorship Tests 

In this section, we will sketch the proof of Theorem 1.11. 

Proof. (Proof of Theorem 1.11) For each rj, e > 0, there exists tq, do such that there are no (s\(c + 
e) + rj, c + e)-approximate ( to , do) - 4 uas i ran dom polymorphisms for Max-A. By Theorem 7.18, this 
implies a unique games hardness to (c, sa( c + e)c + ^-approximate Max-A. By making e —> 0 and 
rj —>• 0, we get a unique games based hardness for (c, lirrn^o s\(c + ^-approximating Max-A. 

By Observation 7.10, we have 

lim sa(c + e) ^ s\(c). 

£—^0 

Hence, the conclusion of Theorem 1.11 follows. □ 

Theorem 7.18. Fix constants r, c and d S N. Suppose Max-A does not admit a (c, s)-approximate 
(r, d)-quasirandom polymorphism then for all p > 0, it is unique games hard to s + p-approximate 
instances of Max-A on instances with value at least c — p. 

Proof. Fix an integer R € N. For all c, s, the set of (c, s)-approximate polymorphisms V of arity R 
form a convex set. Convex combination of two approximate polymorphisms V\ and V 2 is constructed 
by taking a convex combination of the underlying distributions over operations. For every (c, s)- 
approximate polymorphism V, there exists a probability distribution such that, 


E max Inf. ff.ip) 
pav ie[R] vt 


> T ■ 


By min-max theorem, there exists a distribution <f> over the simplex k q such that for every (c, s)- 
approximate polymorphism V, 


E E 

p£P 


max Inf'7, (n) 
i£[R] 


> T 


4>,R 


Define the family of functions 


as follows, 


(7.7) 


= {p =[«]*-► M 


E 


max Inf ^ (p) 


^ T 


By (7.7), there does not exist (c, ^-approximate polymorphisms supported in J r ) T> ^. By the 
connection between approximate polymorphisms and dictatorship tests outlined in Theorem 7.3, 
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this implies that there exists (c, s)-dictatorship tests against the family for each R £ N. Let 

denote the (c, ^-dictatorship test against the family 
These dictatorship gadgets can be utilized towards carrying out the unique games based hard¬ 
ness reduction. To this end, we will need to show a stronger soundness guarantee for the dictator¬ 
ship test. Specifically, we will have to show a soundness guarantee against functions that output 
distributions over [q] (points in k q ) instead of elements over [q]. The details are described below. 

First, we will extend the objective function associated with an instance 9 of Max-A from 
[g]-valued assignments to Ag-valued assignments. Given a fractional assignment z : V —>• k q , let 
z x denote the product distribution over [ q] v whose marginals given by z. Define 9 : A^ -> 1 as 
follows, 

^s(z) = f E [9(x)J 

X~Z X 

Notice that the definitions of influences and low-degree influences extend naturally to A^-valued 
functions. In fact, even for [qj-valued functions these notions were defined by expressing them as k q - 
valued functions. We will show that quasirandom fractional assignments have no larger objective 
value than quasirandom [q]-valued assignments. Specifically, we will show the following. 


Lemma 7.19. For every e > 0, the following holds for all sufficiently large R £ N. For every 


function p : [q] H —>• k q such that E M ^$ max* InfL^(p) ^ r/5, 


cyL« 

° r,d 


(?) 


^ S + £ 


Proof. The idea is to create a rounded operation r : [q] R —>• [q] by setting for each x £ [ q ] R , 


r(x) = random sample from distribution p(x) 

For notational convenience, we will drop the subscripts and superscripts and write 9 for It 

is easy to see that by definition, 


E 


9(r(I (1 >,..., X (R) ))1 = 3(p(lW,..., X( R) )) 


The technical core of the argument shows that if p is quasirandom then the rounded function r is 
quasirandom with high probability. This claim is formally stated in Lemma 7.20. By Lemma 7.20, 
for all sufficiently large R £ N, 


P < E [max Inf< r > ^ 1 — e/Rr 
r i J 

Call a rounded function r : [q] R — > [g] to be quasirandom , if E M ^$[maxj InfL^(r)] < r. 

E [Qhrllr is quasirandom 1 ^ E [9(r)l — P[r is not quasirandoml ^ Ss(p) — e/Rr 

r~p l v r~p r 

For a quasirandom function r : [q] R —>• [</], the dictatorship test 9 satisfies Q(r) ^ s. This implies 
that Q(p) ^ s + £. □ 

Now, we will outline the details of the unique games hardness reduction. Given a unique games 
instance T = (A,B,E, II, [R]), the reduction produces an instance 9?r of Max-A. The variables of 
TIr are B x [q] R ■ The constraints of can be sampled using the following procedure. 


• Sample a € A and neighbors b\,... ,bk G B independently at random. 

• Sample a constraint c from the dictatorship test Suppose the constraint is on , x^,... , £ 

(P 
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• Introduce the constraint C on {( 6 ,, ir abi o 
Given an assignment L : B x [q] R —>• [</] its value is given by, 


*r(L) 


EE E 

ae^l6i,...,b fe GA r (a) CeQ^’f 

T,a 


C(L(bi,7r abi o x w )) 


(7.8) 


Completeness. Suppose C : A U B —> [ii] is a labelling satisfying (1 — ^-fraction of constraints. 
Consider the labelling L : B x [q] R —>• [q] given by L(b, x) = x^ b y Over the choice of a, bi ,..., 
with probability at least (1 — 5k), the labelling £ satisfies each of the k edges {(a, E [&]}. In 
this case, the objective value is at least the completeness c of the dictatorship test . With the 
remaining probability, the objective value is at least —1. This implies that the labelling L has an 
objective value ^ c — 2 k5. 


Soundness. Suppose L : B x [q] R —>• [q] is a labelling with an objective value s + rj. For each 
b £ B, set L b : [q] R —> [</] as L b (x) = eub,x) where £ k. q corresponds to a basis vector 

along direction L(b,x). For each a £ A, define the fractional assignment L a : [q] R —>• k q as 

L a (x) = f E b eN(a)[L b (7Tab ° z)]- 

The objective value in (7.8) can be written as, 


3r (L) 





<S>,R 

r,d 



Decode an assignment £ : AU B —>• [R] as follows. Sample a distribution fi ~ 4?. For each a £ A, 
define the label set 

T o = {z|Infg(L fl )>r/10} 
and for each b E B, define the label set 

T 6 = {j|Infg(L & )^r/20}. 

Since the sum of degree d influences is at most d, we have \T a \ ^ 10 d/r and \T b \ ^ 20 d/r for all 
a £ A and b £ B. 

For each a £ A, assign £{a) to be a random label from T a if it is non-empty, else assign an 
arbitrary label from [R], Similarly, for each b £ B, set £(b) to be a random label from T b if it is 
non-empty, else assign an arbitrary label from [R\. 

If the objective value of the assignment L is more than s + 77 , then for at least 77/ 2 fraction of 
a £ A, we have ^’^(L a ) > s + 77 / 2 . Call such a vertex a £ A to be good. Fix a good vertex a £ A. 
For every good vertex, by the soundness of the dictatorship test we will have, 

E [maxInfg (L a )] > r. 


This implies that over the choice of y, ~ 4>, with probability at least r/2, we will have maxj Inf^(L a ) ^ 
r/2 which implies that L a is non-empty. Suppose i £ L a . Using the fact that L a = E fcgA r( a ) 77 a b° Lb 
and convexity of influences we have, 


E[Inf 


<d 


(L b )\>r/2. 


This implies that for at least r/4-fraction of neighbors b, we will have Ini^ ab ^\^(L b ) ^ r/4, i.e., 
7 T ab {i ) € T b . For every such neighbor b , the labelling l satisfies the edge (a, b) with probability at 
least |^-|- • jTyry ^ r 2 / 200 d 2 . 

Hence, the fraction of edges satisfied by the labelling £ is at least ( 77 / 2 ) • (r/2) • (r/4) • (r 2 /200d 2 ) 
in expectation. By fixing an alphabet size R large enough, the soundness 5 of the unique games 
instance can be made smaller than this fraction. 

□ 


36 






7.5 Quasirandomness under Randomized Rounding 

This section is devoted to showing the following lemma which was used in the proof of Theorem 7.18. 

Lemma 7.20. For every r, d,e, the following holds for all large enough R G N: Suppose p : [g] R —>• 
k q and a distribution $ over k q is such that, 

E [maxlnfg(p)] < r 


then if r : [q] R —>• [q] is sampled by setting each r{x) G [q] independently from the distribution p(x), 

P ( E [maxlnff^(r)] < 5rl ^ 1 — e/Rr 
r i J 

To this end, we first show a few simple facts concerning concentration under random sampling. 

Random Sampling. 

Lemma 7.21. Given p : [g] R —> k q let r : [g] R — >• [q] denote be a function sampled by setting 
r(x) € [q] from the distribution p(x) G k q . Fix a probability distribution p G k q and a corresponding 
orthonormal basis {Xa} a c[q] n f or L 2 ([q\ R , p R ). For every multi-index a G [q] R , 

Wc\ = Pa , 

H 


and 

^[\r<T-pa\ > P] <: 2exp 

Proof. Fix a multi-index a G [q] R - Let Xa : [g] R —>• M denote the corresponding basis function for 
L 2 ([q] R ,p R ). Let p R (x) denote the probability of x G [g]^ under the product distribution p R . By 
definition, 

r a = F(.x)Xa(x)r(x) 

x&[q] n 

where {r{x)} x( z^r are independent random variables whose expectation is given by Er(x) = p(x). 
For every x, the random vector p(x)xa{x)r(x) has entries bounded in [0, p(x)xa{x)\. Now we 
appeal to Hoeffding’s inequality stated below. 

Lemma 7.22. (Hoeffding’s inequality) Given independent real valued random variables Xi, ..., X n 
such that Xi G [&i,a,], we have 



i i 

By Hoeffding’s inequality, we will have 

P[|fv -p a | > P] < 2 exp ( - 


^ 2 exp — 




P 2 


E x P(x) 2 Xa(x) 2 


The proof is complete once we observe that E x P( x ) 2 Xa( x ) 2 ^ m&x x p(x) ■ p(x)x 2 (x) = 

rnax-c p(x) ^ (maxj p(i)) R . 

□ 
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Lemma 7.23. Given a function r : [q] R —>• W l and a probability distribution p E k q , for all i E [R] 
and D E N, 

Inf//, (r) ^81 — maxu(i) • max||r(x )|| 2 
y i£[q] J X 

Proof. First, note that for each D £ N we have Inf//P(r) ^ Infj )M (r). Moreover, the influence of 
the i th coordinate can be written as, 


lnf ii# 1 (r) 


E 

c [n]\i~/d 


Var r(x) 

Xi^fL 


(7.9) 


For each fixing of X[ n ]\i G [q] R , we will bound the variance of r(x) over the choice of X{ as follows, 


Var[r(x)] = E ||r(xr n ]\j, z) - r{x [n] \ h z')\\ 

Z,z'~u L 1 1 ' L J ' 

^ 4 max ||r(x )|| 2 • F[z 7 ^ z'] 

x£[q] R 

^ 8 max ||r(x )|| 2 • I 1 — ma xu(i) ) . 

\xe[g] R J V i ) 


The result follows by using the above bound in (7.9). □ 

Lemma 7.24. For every r, d, e, the following holds for all large enough R E N: Given p : [q] R —>• k q 
let r : [q] R —>• [g] denote a function sampled by setting r(x) E [ 5 ] from the distribution p(x) E k q . 
For every distribution p E k q and i E [i?], 

P[Inf<^(r) > 2 Infg(p) + r] < 0* 2 

Proof. By Lemma 7.23, if max je [ g ] p(j) > 1 — r/16 then we will have Inf//)(r) < r for all functions 
r : [q] R —>• [q]. Hence the statement trivially holds if max je y p(j) > 1 — r/16. 

Suppose maxjg^j p(j) ^ 1 — r/ 8 . In this case, for every multi-index a E [q] R , Lemma 7.21 
implies that 

n\ra~Pa\ 2 exp 

By a union bound over all a with |<r| < d, we will have 

F[\r a -p a \^j3 V|ct| < d\ ^ 2(qR) d exp 


P 2 \ 

(1-t/ 8) r J ' 



Fix P = (qR) d . Fix R large enough so that l/(qR) d < t and the above probability bound is 
smaller than e/i? 2 . If \f a — p a \ ^ (3 for all er with |<r| < d , then 

Inf^(r) < E (Pa + P? ^ E 2 Pl + W 2 < 2 Inf f d M + 2(qR)- 2d (qR) d ^ 2 Infg(p) + r 

&i7^0,\cr\<d 


□ 


Now we are ready to prove Lemma 7.20. 

Proof. (Proof of Lemma 7.20) Fix a distribution p ~ 4>. Call a probability distribution p ~ <!> to 
be bad if 

max Inf f d (r) > 2 max Inf//) (p) + r 
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For each distribution p e A q , by Lemma 7.24 and a union bound over i € [R] 


P[/i is bad] ^ e/R 


By Markov’s bound, 

P | E^I[/i is bad] > r| < e/Rr 

Let us suppose E^$[I[/i is bad]] < r. In this case, we conclude that, 



(7.10) 


(7.11) 


where we used the fact that for every distribution fi£i g and every function r : [q] R [q] we have 
max* Inf^(r) < 2. The claim follows from (7.10) and (7.11). □ 
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